-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Currently the Node
class used for ImmutableList<T>
includes 40 bytes of overhead per list element on X64, and 24 bytes of overhead on X86. This means the memory cost of using this data structure for an ImmutableList<int>
is 1000% on X64, and 600% on X86 (these figures represent the amount of memory used for storing non-data elements associated with the structure).
I propose migrating to a B+-tree instead, using a fixed value for b and declaring data values in the index and data pages directly (to avoid unnecessary memory overhead for arrays). The choice of b primarily affects the following things:
- The height of the tree. Shorter tree heights improve performance for random access lookups by improving memory locality for a majority of traversal operations.
- The size of the pages. Mutations of immutable data structures require reconstructing pages leading back to the root. If the pages are large, mutations could become expensive.
In addition to the above, B+-trees provide a substantial memory savings by removing the child pointers (the Left and Right pointers for AVL nodes) from the leaves.
My initial calculations (which could be completely wrong) indicate that the average memory allocation required for a single mutation operation on a list of 100000 elements using the AVL structure is 775 bytes for X64 and 493 bytes for X86. The optimal value of b for X64 in terms of mutation allocation overhead is b=8, which costs 476 bytes per operation on a list of the same size (39% reduction). For X86 this same value of b costs 388 bytes per mutation (21% reduction, and b=6 would be optimal with 382 bytes). The overall storage overhead for b=8 drops a staggering amount down to 169% for X64 and 119% for X86 (counting the average page at 75% capacity, based on a merge at 50% and split at 100%).