STSdb 4.0 provides parameters to control how much memory it will use. These parameters practically set the basic WaterfallTree properties:

- min/max children (branches) in each internal (non-leaf) node
- max operations in the root node
- min/max operations in each internal node
- min/max records in each leaf node
- number of cached nodes in the memory

The above values are controlled by the following StorageEngine fields:

StorageEngine engine = (StorageEngine)STSdb.FromFile("test.stsdb4"); //min/max children (branches) in each internal node engine.INTERNAL_NODE_MIN_BRANCHES = 2; engine.INTERNAL_NODE_MAX_BRANCHES = 5; //max operations in the root node engine.INTERNAL_NODE_MAX_OPERATIONS_IN_ROOT = 8 * 1024; //min/max operations in each internal node engine.INTERNAL_NODE_MIN_OPERATIONS = 32 * 1024; engine.INTERNAL_NODE_MAX_OPERATIONS = 64 * 1024; //min/max records in each leaf node engine.LEAF_NODE_MIN_RECORDS = 8 * 1024; engine.LEAF_NODE_MAX_RECORDS = 64 * 1024; //at least 2 x MIN_RECORDS //number of cached nodes in memory engine.CacheSize = 64;

The values in the code are defaults. The default settings assume that the stored objects will be with relatively small size (for example 50-100 bytes) and each W-tree node will contain from few thousands to tens of thousands of records. The idea is that **each W-tree node must be at least several megabytes **(6-10MB), to minimize the relative weight of seek penalty versus the recording node time.

In WaterfallTree each node has internal buffers with operations. Each operation is usually a triple - {*code, key, record*}. For example: {replace, 5, "a"}, {insertOrIgnore, 5, "b"}, {delete, 5} etc (these are the asynchronous operations with a certain key). The number of buffered operations in each internal node, except the root, is controlled by:

//min/max operations in each internal node engine.INTERNAL_NODE_MIN_OPERATIONS; engine.INTERNAL_NODE_MAX_OPERATIONS;

INTERNAL_NODE_MAX_OPERATIONS controls when an internal node is *overflowed* with operations. When it overflows, a part of the buffered operations are poured down to the child nodes. From the other side the INTERNAL_NODE_MIN_OPERATIONS controls when an internal node is *underflowed* with operations, i.e. when it contains too small number of operations in its buffers. When it is in an underflowed state, the node has to be merged with its neighbor node.

For performance reasons the number of buffered operations in the root node (when it has children) is controlled by a separate variable:

//max operations in the root node engine.INTERNAL_NODE_MAX_OPERATIONS_IN_ROOT

There is no minimal limit for the operations in root.

The records in each leaf node are controlled by:

//min/max records in each leaf node engine.LEAF_NODE_MIN_RECORDS; engine.LEAF_NODE_MAX_RECORDS;

LEAF_NODE_MAX_RECORDS controls when a leaf node is *overflowed* with records. When a leaf node is overflowed it is splited into two leaf nodes and the new child is added to the parent node. When the number of children of an internal node exceeds the INTERNAL_NODE_MAX_BRANCHES, the node is splitted into two nodes, on its turn increasing the number of child nodes of its parent. INTERNAL_NODE_MAX_BRANCHES is practically the W-tree degree.

LEAF_NODE_MIN_RECORDS controls when a leaf node is *underflowed* with records, i.e. when it contains too small number of records. When it is in an underflowed state, the node has to be merged with its neighbor node.

If you change the W-tree parameters, we recommend testing the different values in advance to achieve max performance for your average record size. You can test different scenarios – small nodes and large number of cached nodes, large nodes and small number of cached nodes etc. Deviating from the default min/max ratio is also allowed.

* NOTE *

*: All split and merge actions are asynchronous and often work in parallel - they are not performed immediately when the conditions occurs. Instead, they are performed with a delay. Thus, in most cases, the W-tree can exceed the so defined boundaries, but generally it will keep the nodes around these values.*

//number of cached nodes in memory engine.CacheSize;

The number of all cached WaterfallTree nodes, no matter internal or leaf, is controlled by:

**It is strongly recommended for the number of cached W-Tree nodes to be greater than or equal to the maximum depth of the W-tree.** Otherwise, the performance will be affected - because WaterfallTree needs to keep at least maxDepth number of nodes in the memory (one path of nodes).

Minimum and maximum W-tree depth for recordCount number of records contained in the database can be calculated by the formulas:

public int GetMinimumWTreeDepth(long recordCount) { int b = INTERNAL_NODE_MAX_BRANCHES; int R = INTERNAL_NODE_MAX_OPERATIONS_IN_ROOT; int I = INTERNAL_NODE_MAX_OPERATIONS; int L = LEAF_NODE_MAX_RECORDS; double depth = Math.Log(((recordCount - R) * (b - 1) + b * I) / (L * (b - 1) + I), b) + 1; return (int)Math.Ceiling(depth); }

public int GetMaximumWTreeDepth(long recordCount) { int b = INTERNAL_NODE_MAX_BRANCHES; int L = LEAF_NODE_MAX_RECORDS; double depth = Math.Log(recordCount / L, b) + 1; return (int)Math.Ceiling(depth); }

The above methods are available in the StorageEngine class.

For example, for database with 1 000 000 000 records, STSdb 4.0.4 with default settings will create W-tree with max depth 7. The default engine.CacheSize is 64, which is fine (64 > 7).

The number of all cached operations (in the internal nodes) in one StorageEngine instance can reach up to:

var cachedOperations = engine.INTERNAL_NODE_MAX_OPERATIONS * (engine.CacheSize - 1) + engine.INTERNAL_NODE_MAX_OPERATIONS_IN_ROOT;

The number of all cached records (in leaf nodes) in one StorageEngine instance can reach up to:

var cachedRecords = engine.LEAF_NODE_MAX_RECORDS * engine.CacheSize;

The maximum number of all cached records - no matter weither they are operations in internal nodes or pure records in leaf nodes, varies between the above two values, depending on the ratio between cached internal and leaf nodes.

* NOTE *

*: Keep in mind that W-tree nodes are heterogeneou*

*s - each node contains operations, keys and records from different tables. So we have to tune the above min/max parameters and CacheSize property according to the overall expected average key + record size.*