Switch to persistent adaptive radix trees #9

joamaki · 2024-04-04T15:49:40Z

Switch StateDB to use custom implementation of Persistent Adaptive Radix Trees (https://db.in.tum.de/~leis/papers/ART.pdf).

The implementation isn't a completely fateful implementation of the paper. The Node16 SIMD tricks and Node48 key array are skipped. This made the implementation simpler, but did leave some performance improvements on the table.

The only functional change to StateDB is to drop the Last and LastWatch methods since that requires an reverse iterator implementation and since we didn't use them we can just drop them for now.

Benchmark before:

goos: linux                                                                                                                                                                                     
goarch: amd64                                                                                                                                                                                   
pkg: github.com/cilium/statedb                                                                                                                                                                  
cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz                                                                                                                                                  
BenchmarkDB_WriteTxn_1-8                                          307952              4012 ns/op            249227 objects/sec      3080 B/op         53 allocs/op                              
BenchmarkDB_WriteTxn_10-8                                         504811              2373 ns/op            421360 objects/sec      1501 B/op         23 allocs/op                              
BenchmarkDB_WriteTxn_100-8                                        522646              2229 ns/op            448544 objects/sec      1399 B/op         19 allocs/op                              
BenchmarkDB_WriteTxn_1000-8                                       369973              2954 ns/op            338547 objects/sec      1492 B/op         18 allocs/op                              
BenchmarkDB_WriteTxn_10000-8                                      995700              2792 ns/op            358210 objects/sec      1355 B/op         21 allocs/op                              
BenchmarkDB_WriteTxn_100_SecondaryIndex-8                         293709              3843 ns/op            260247 objects/sec      2208 B/op         33 allocs/op                              
BenchmarkDB_RandomInsert-8                                          1032           1192822 ns/op            838348 objects/sec    796015 B/op      12214 allocs/op                              
BenchmarkDB_RandomReplace-8                                          243           4906222 ns/op            203823 objects/sec   2382866 B/op      33380 allocs/op                              
BenchmarkDB_SequentialInsert-8                                       430           2770646 ns/op            360927 objects/sec   1492345 B/op      18256 allocs/op                              
BenchmarkDB_Baseline_SingleRadix_Insert-8                           3109            371203 ns/op           2693945 objects/sec    353662 B/op       5103 allocs/op                              
BenchmarkDB_Baseline_SingleRadix_TrackMutate_Insert-8               3105            373685 ns/op           2676056 objects/sec    353951 B/op       5106 allocs/op                              
BenchmarkDB_Baseline_SingleRadix_Lookup-8                          13185             90097 ns/op          11099219 objects/sec         0 B/op          0 allocs/op                              
BenchmarkDB_Baseline_Part_RootOnlyWatch_Insert-8                    5473            204824 ns/op           4882248 objects/sec    146969 B/op       3065 allocs/op                              
BenchmarkDB_Baseline_Part_Insert-8                                  4282            257142 ns/op           3888905 objects/sec    246580 B/op       4094 allocs/op                              
BenchmarkDB_Baseline_Part_Lookup-8                                 36069             33324 ns/op          30008204 objects/sec         0 B/op          0 allocs/op                              
BenchmarkDB_Baseline_Hashmap_Insert-8                              15986             74292 ns/op          13460333 objects/sec     86561 B/op         64 allocs/op                              
BenchmarkDB_Baseline_Hashmap_Lookup-8                              84038             14228 ns/op          70282379 objects/sec         0 B/op          0 allocs/op                              
BenchmarkDB_DeleteTracker_Baseline-8                                 385           3065639 ns/op            326197 objects/sec   1758394 B/op      23413 allocs/op                              
BenchmarkDB_DeleteTracker-8                                          192           6763615 ns/op            147850 objects/sec   3387030 B/op      40807 allocs/op                              
BenchmarkDB_RandomLookup-8                                          7114            153828 ns/op           6500774 objects/sec       144 B/op          1 allocs/op                              
BenchmarkDB_SequentialLookup-8                                      8168            142537 ns/op           7015728 objects/sec      8000 B/op       1000 allocs/op                              
BenchmarkDB_FullIteration_All-8                                   100930             11497 ns/op          86979024 objects/sec       280 B/op          6 allocs/op                              
BenchmarkDB_FullIteration_Get-8                                    81974             14355 ns/op          69661408 objects/sec       320 B/op          7 allocs/op
BenchmarkDB_PropagationDelay-8                                    243086              4838 ns/op                43.00 50th_µs           50.00 90th_µs          212.0 99th_µs        3083 B/op        50 allocs/op

Benchmark after:

BenchmarkDB_WriteTxn_1-8                                  380911              3093 ns/op            323322 objects/sec      2440 B/op         38 allocs/op
BenchmarkDB_WriteTxn_10-8                                 901350              1336 ns/op            748741 objects/sec       699 B/op         12 allocs/op
BenchmarkDB_WriteTxn_100-8                               1000000              1075 ns/op            930477 objects/sec       581 B/op          9 allocs/op
BenchmarkDB_WriteTxn_1000-8                              1000000              1262 ns/op            792304 objects/sec       578 B/op          9 allocs/op
BenchmarkDB_WriteTxn_10000-8                             1000000              1262 ns/op            792171 objects/sec       548 B/op          9 allocs/op
BenchmarkDB_WriteTxn_100_SecondaryIndex-8                 705823              1752 ns/op            570791 objects/sec       934 B/op         18 allocs/op
BenchmarkDB_RandomInsert-8                                  1563            743594 ns/op           1344819 objects/sec    481777 B/op       9162 allocs/op
BenchmarkDB_RandomReplace-8                                  530           2266878 ns/op            441135 objects/sec    932101 B/op      19170 allocs/op
BenchmarkDB_SequentialInsert-8                               970           1264837 ns/op            790616 objects/sec    578330 B/op       9168 allocs/op
BenchmarkDB_Baseline_Part_RootOnlyWatch_Insert-8            5121            220673 ns/op           4531599 objects/sec    146970 B/op       3065 allocs/op
BenchmarkDB_Baseline_Part_Insert-8                          4135            281535 ns/op           3551957 objects/sec    246583 B/op       4094 allocs/op
BenchmarkDB_Baseline_Part_Lookup-8                         35305             34346 ns/op          29115518 objects/sec         0 B/op          0 allocs/op
BenchmarkDB_Baseline_Hashmap_Insert-8                      14404             83092 ns/op          12034837 objects/sec     86549 B/op         64 allocs/op
BenchmarkDB_Baseline_Hashmap_Lookup-8                      77744             15222 ns/op          65694567 objects/sec         0 B/op          0 allocs/op
BenchmarkDB_DeleteTracker_Baseline-8                         945           1267367 ns/op            789038 objects/sec    648963 B/op      13062 allocs/op
BenchmarkDB_DeleteTracker-8                                  501           2361545 ns/op            423452 objects/sec   1176999 B/op      20188 allocs/op
BenchmarkDB_RandomLookup-8                                 13999             82787 ns/op          12079196 objects/sec       144 B/op          1 allocs/op
BenchmarkDB_SequentialLookup-8                             14426             77552 ns/op          12894616 objects/sec      8000 B/op       1000 allocs/op
BenchmarkDB_FullIteration_All-8                            86900             12131 ns/op          82435475 objects/sec       264 B/op          6 allocs/op
BenchmarkDB_FullIteration_Get-8                            75100             14046 ns/op          71195485 objects/sec       304 B/op          7 allocs/op
BenchmarkDB_PropagationDelay-8                            398498              3135 ns/op                27.00 50th_µs           32.00 90th_µs          170.0 99th_µs        1479 B/op         28 allocs/op

Highlights:

WriteTxn_1 went from 250k/s to 323k/s
WriteTxn_100 went from 448k/s to 930k/s
RandomLookup from 700k/s to 1208k/s
DeleteTracker up from 150k/s to 423k/s
Propagation delay down from 43uS to 27uS (50th)
Reconciler throughput up from ~200k/s to 350k/s, memory usage down 50%

joamaki · 2024-04-09T13:34:40Z

Benchmark results vs immutable-radix:

BenchmarkDB_Baseline_Part_Insert-8                      	   5067	   217217 ns/op	  4603701 objects/sec	 107524 B/op	   2776 allocs/op
BenchmarkDB_Baseline_SingleRadix_Insert-8               	   3138	   381252 ns/op	  2622943 objects/sec	 353661 B/op	   5103 allocs/op
BenchmarkDB_Baseline_SingleRadix_TrackMutate_Insert-8   	   2782	   396346 ns/op	  2523049 objects/sec	 353950 B/op	   5106 allocs/op
BenchmarkDB_Baseline_Part_TrackMutate_Insert-8          	   4213	   261964 ns/op	  3817327 objects/sec	 208022 B/op	   3821 allocs/op
BenchmarkDB_Baseline_SingleRadix_Lookup-8               	  10000	   101206 ns/op	  9880891 objects/sec	      0 B/op	      0 allocs/op
BenchmarkDB_Baseline_Part_Lookup-8                      	  10000	   104054 ns/op	  9610405 objects/sec	      0 B/op	      0 allocs/op
BenchmarkDB_Baseline_Hashmap_Insert-8                   	  15531	    77728 ns/op	 12865460 objects/sec	  86551 B/op	     64 allocs/op
BenchmarkDB_Baseline_Hashmap_Lookup-8                   	  79976	    15558 ns/op	 64277520 objects/sec	      0 B/op	      0 allocs/op

"Part_Insert" is without the watch channel management, and "Part_TrackMutate_Insert" is with. "SingleRadix_TrackMutate" still creates the watch channels but doesn't close them (so it's not apples-to-apples).

This definitely shows that it's worth going this direction. It's almost twice as fast and half the allocations. We can then for the revision index just have a single watch channel per table rather than per node.

There's not that much missing now. Need support for Len() and probably want to use generics to allow embedding the "object" struct into the "node*" struct and avoid an allocation.

joamaki · 2024-04-11T09:55:40Z

The impact of this is really visible in the reconciler benchmark:

benchmark % go run . -batchsize 100                                                                                                                                    (pr/joamaki/part)statedb
Inserting batch 1000/1000 ...
Waiting for reconciliation to finish ...

100000 objects reconciled in 0.48 seconds (batch size 100)
Throughput 206555.64 objects per second
Allocated 1423417 objects, 115971kB bytes, 117728kB bytes still in use
benchmark % stg push
Pushing patch "statedb-switch-to-part" ... done (unmodified)
Now at patch "statedb-switch-to-part"
benchmark % go run . -batchsize 100
Inserting batch 1000/1000 ...
Waiting for reconciliation to finish ...

100000 objects reconciled in 0.34 seconds (batch size 100)
Throughput 297510.35 objects per second
Allocated 983554 objects, 57910kB bytes, 60208kB bytes still in use

It's 50% faster and we're using almost half the amount of memory for the objects.

dylandreimerink

Found a few small thing. In general things seem to check out. However, the part/iterator.go and part/txn.go have some very dense code which I find difficult to reason through. That is partially due to also having to wrap my head around the ART implementation which is new to me.

There are a few edge cases in there which are not covered by the current tests, so I would like to schedule a codewalk to go through it together before my final verdict.

part/tree.go

part/node.go

benchmarks_test.go

joamaki · 2024-04-15T15:10:39Z

Found a few small thing. In general things seem to check out. However, the part/iterator.go and part/txn.go have some very dense code which I find difficult to reason through. That is partially due to also having to wrap my head around the ART implementation which is new to me.

There are a few edge cases in there which are not covered by the current tests, so I would like to schedule a codewalk to go through it together before my final verdict.

Thanks for the review! What edge cases are not covered? I think it'd be great to have someone else besides me write a few test cases for this and try to break it...

bimmlerd

part looking good so far - need to further review iterator and the tests, and then the integration, but it's hard work 😅

bimmlerd · 2024-04-16T07:24:29Z

part/tree.go

+// SPDX-License-Identifier: Apache-2.0
+// Copyright Authors of Cilium
+
+package part


all of part seems to be undocumented from the user perspective. If you could add a godoc to func NewTree, type Tree and type Txn that would already helpquite a bit, unless you think it's not super useful to document part as a free standing lib.

Bonus for some simple examples (or let me know and I'll try to write them).

Sure! I'll do a pass to add docs to exported types & funcs.

bimmlerd · 2024-04-16T07:27:51Z

part/txn.go

+type Txn[T any] struct {
+	opts *options
+	root *node[T]
+	size int


what does the size of a transaction track? Is it the number of operations (i.e. inserts/deletes) or is it an optimization keeping track of the size of the resulting tree?

Number of objects in the tree for implementing Len(). I'll document. the fields.

bimmlerd · 2024-04-16T07:31:27Z

part/txn.go

+}
+
+func (txn *Txn[T]) Commit() *Tree[T] {
+	txn.mutated = nil


could clear here, but I'm not sure what the differences are - do we rely on len(map) == 0 or on map == nil somewhere? I'd imagine clear being potentially more efficient by avoiding GC'ing the map memory.

Ah interesting. I'll check if clear is better from GC perspective!

goos: linux goarch: amd64 pkg: github.com/cilium/statedb/part cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz Benchmark_txn_1-8 1608686 775.2 ns/op 1290049 objects/sec PASS ok github.com/cilium/statedb/part 2.275s part % go test . -bench txn_1\$ (pr/joamaki/part)statedb goos: linux goarch: amd64 pkg: github.com/cilium/statedb/part cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz Benchmark_txn_1-8 1767748 643.4 ns/op 1554270 objects/sec PASS

First one is with clear, second without. So clearly worse.

clearly worse 😁 anyway - good to know!

bimmlerd · 2024-04-16T07:40:46Z

part/txn.go

+	// Reuse the same slice in the transaction to hold the parents in order to avoid
+	// allocations.
+	if txn.deleteParents == nil {
+		txn.deleteParents = make([]deleteParent[T], 0, 32)


instead of 32, should this instead depend on the (estimated) height of the tree?

I don't think we need to do anything fancy like that. I just picked a number that's high enough to be most of the time big enough. I can move this into a constant and add some comment about why this number.

part/txn.go

bimmlerd · 2024-04-16T08:10:48Z

part/node.go

+	return nCopy
+}
+
+func (n *node[T]) promote(watch bool) *node[T] {


putting this here for lack of a better place. The paper has specialised the structure of each inner node, whereas your implementation "only" specialises node256 - I'm guessing you're leaving room for further optimization to keep some simplicity? Or did you benchmark and the other structures didn't do much good?

I went for simplicity so I didn't even initially specialize node256 (until I benchmarked and noticed it makes a ton of sense). Since the perf is already 2x to what we had before I would leave it at this for now.

part/node.go

bimmlerd · 2024-04-16T08:47:33Z

part/node.go

+	case nodeKind48:
+		return n.node48().children[0:n.size():48]
+	case nodeKind256:
+		return n.node256().children[:]


why the [:]? Could also be left out or [:256] I guess?

We need [:] to convert it to a slice. [:256] would be the same as [:].

part/iterator.go

bimmlerd

LGTM pending a few nits and questions, but I think this is good enough as is already.

part/part_test.go

bimmlerd · 2024-04-16T09:09:25Z

part/part_test.go

+
+		txn = tree.Txn()
+		for _, i := range keys {
+			_, hadOld = txn.Insert(intKey(i), i)


inject another get which confirms they're gone also after committing?

bimmlerd · 2024-04-16T09:12:25Z

part/part_test.go

+}
+
+func Test_iterate(t *testing.T) {
+	sizes := []int{0, 1, 10, 100, 1000, rand.Intn(1000)}


any specific behaviour we want for calling Next() on iterators which already finished?

Added assert.

bimmlerd · 2024-04-16T09:15:10Z

part/iterator.go

+	edges := [][]*node[T]{}
+
+	var traverseToMin func(n *node[T])
+	traverseToMin = func(n *node[T]) {


nit: definition and declaration separate, could be joined?

No it can't be since traverseToMin calls traverseToMin

I lifted traverseToMin into a top-level function.

runtime.GC wasn't called and the memstats were read after Stop(). Signed-off-by: Jussi Maki <[email protected]>

Add an implementation of Persistent Adaptive Radix Trees. Adaptive Radix Trees are described in the paper https://db.in.tum.de/~leis/papers/ART.pdf. Compared to the go-immutable-radix which is non-adaptive the adaptive radix trees use less memory and avoid an extra pointer chase as the children array is embedded into the node. This implementation also allows only using watch channel at the root node which is more efficient for the revision index where per-node watch channels are not useful (we can't use them with LowerBound). Signed-off-by: Jussi Maki <[email protected]> Refresh of part-add-custom-implementation

Code coverage reports showed that there are a few edge cases that are not covered. This commit adds a few tests to cover some of those. Signed-off-by: Dylan Reimerink <[email protected]>

Deduplicate WriteTxn benchmark code and add tests for larger transaction sizes. Signed-off-by: Jussi Maki <[email protected]>

Use the part library instead of go-immutable-radix. Since part doesn't have a reverse iterator drop the Last and LastWatch support (they were never used, we can add them back if we need them). Benchmark results before: statedb % go test . -bench . -benchmem goos: linux goarch: amd64 pkg: github.com/cilium/statedb cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz BenchmarkDB_WriteTxn_1-8 332079 3431 ns/op 291454 objects/sec 3080 B/op 53 allocs/op BenchmarkDB_WriteTxn_10-8 567682 2026 ns/op 493603 objects/sec 1501 B/op 23 allocs/op BenchmarkDB_WriteTxn_100-8 570948 2010 ns/op 497480 objects/sec 1399 B/op 19 allocs/op BenchmarkDB_WriteTxn_1000-8 469208 2715 ns/op 368260 objects/sec 1495 B/op 18 allocs/op BenchmarkDB_WriteTxn_10000-8 1000000 2609 ns/op 383238 objects/sec 1349 B/op 21 allocs/op BenchmarkDB_WriteTxn_100_SecondaryIndex-8 309526 3554 ns/op 281402 objects/sec 2208 B/op 33 allocs/op BenchmarkDB_RandomInsert-8 1108 1010195 ns/op 989908 objects/sec 796015 B/op 12214 allocs/op BenchmarkDB_RandomReplace-8 250 4612771 ns/op 216789 objects/sec 2383117 B/op 33382 allocs/op BenchmarkDB_SequentialInsert-8 398 2997096 ns/op 333656 objects/sec 1492145 B/op 18255 allocs/op BenchmarkDB_Baseline_Part_Insert-8 4711 222810 ns/op 4488140 objects/sec 115342 B/op 3071 allocs/op BenchmarkDB_Baseline_SingleRadix_Insert-8 3504 331964 ns/op 3012375 objects/sec 353663 B/op 5103 allocs/op BenchmarkDB_Baseline_SingleRadix_TrackMutate_Insert-8 3328 333911 ns/op 2994818 objects/sec 353952 B/op 5106 allocs/op BenchmarkDB_Baseline_Part_TrackMutate_Insert-8 4189 273483 ns/op 3656536 objects/sec 214634 B/op 4096 allocs/op BenchmarkDB_Baseline_SingleRadix_Lookup-8 13726 84329 ns/op 11858358 objects/sec 0 B/op 0 allocs/op BenchmarkDB_Baseline_Part_Lookup-8 10000 101451 ns/op 9857032 objects/sec 0 B/op 0 allocs/op BenchmarkDB_Baseline_Hashmap_Insert-8 17227 70050 ns/op 14275499 objects/sec 86552 B/op 64 allocs/op BenchmarkDB_Baseline_Hashmap_Lookup-8 81237 14801 ns/op 67561608 objects/sec 0 B/op 0 allocs/op BenchmarkDB_DeleteTracker_Baseline-8 416 2930618 ns/op 341225 objects/sec 1758387 B/op 23413 allocs/op BenchmarkDB_DeleteTracker-8 194 6178728 ns/op 161846 objects/sec 3392362 B/op 40843 allocs/op BenchmarkDB_RandomLookup-8 8728 137090 ns/op 7294481 objects/sec 144 B/op 1 allocs/op BenchmarkDB_SequentialLookup-8 9162 128953 ns/op 7754759 objects/sec 8000 B/op 1000 allocs/op BenchmarkDB_FullIteration_All-8 103431 11386 ns/op 87823494 objects/sec 296 B/op 8 allocs/op BenchmarkDB_FullIteration_Get-8 83902 13900 ns/op 71943487 objects/sec 336 B/op 9 allocs/op BenchmarkDB_PropagationDelay-8 240777 4609 ns/op 41.00 50th_µs 48.00 90th_µs 199.0 99th_µs 3083 B/op 50 allocs/op PASS ok github.com/cilium/statedb 38.595s Benchmark results after: statedb % go test . -bench . -benchmem goos: linux goarch: amd64 pkg: github.com/cilium/statedb cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz BenchmarkDB_WriteTxn_1-8 412078 2748 ns/op 363939 objects/sec 2408 B/op 38 allocs/op BenchmarkDB_WriteTxn_10-8 949206 1267 ns/op 789411 objects/sec 663 B/op 12 allocs/op BenchmarkDB_WriteTxn_100-8 995774 1145 ns/op 873221 objects/sec 521 B/op 9 allocs/op BenchmarkDB_WriteTxn_1000-8 968202 1350 ns/op 740537 objects/sec 516 B/op 9 allocs/op BenchmarkDB_WriteTxn_10000-8 1000000 1507 ns/op 663678 objects/sec 486 B/op 9 allocs/op BenchmarkDB_WriteTxn_100_SecondaryIndex-8 559314 2031 ns/op 492404 objects/sec 844 B/op 18 allocs/op BenchmarkDB_RandomInsert-8 1326 841762 ns/op 1187984 objects/sec 418200 B/op 9170 allocs/op BenchmarkDB_RandomReplace-8 438 2705055 ns/op 369678 objects/sec 837668 B/op 19188 allocs/op BenchmarkDB_SequentialInsert-8 759 1636911 ns/op 610907 objects/sec 515916 B/op 9185 allocs/op BenchmarkDB_Baseline_Part_Insert-8 5148 213659 ns/op 4680351 objects/sec 115341 B/op 3071 allocs/op BenchmarkDB_Baseline_SingleRadix_Insert-8 3252 345056 ns/op 2898080 objects/sec 353662 B/op 5103 allocs/op BenchmarkDB_Baseline_SingleRadix_TrackMutate_Insert-8 3238 371801 ns/op 2689613 objects/sec 353951 B/op 5106 allocs/op BenchmarkDB_Baseline_Part_TrackMutate_Insert-8 3883 292706 ns/op 3416395 objects/sec 214636 B/op 4096 allocs/op BenchmarkDB_Baseline_SingleRadix_Lookup-8 13491 88791 ns/op 11262375 objects/sec 0 B/op 0 allocs/op BenchmarkDB_Baseline_Part_Lookup-8 10000 104666 ns/op 9554213 objects/sec 0 B/op 0 allocs/op BenchmarkDB_Baseline_Hashmap_Insert-8 17108 71518 ns/op 13982445 objects/sec 86539 B/op 64 allocs/op BenchmarkDB_Baseline_Hashmap_Lookup-8 83188 13999 ns/op 71434361 objects/sec 0 B/op 0 allocs/op BenchmarkDB_DeleteTracker_Baseline-8 934 1317762 ns/op 758863 objects/sec 525331 B/op 12317 allocs/op BenchmarkDB_DeleteTracker-8 488 2450865 ns/op 408019 objects/sec 925854 B/op 18652 allocs/op BenchmarkDB_RandomLookup-8 6177 165365 ns/op 6047254 objects/sec 144 B/op 1 allocs/op BenchmarkDB_SequentialLookup-8 8071 146586 ns/op 6821924 objects/sec 8000 B/op 1000 allocs/op BenchmarkDB_FullIteration_All-8 103074 11917 ns/op 83911284 objects/sec 280 B/op 8 allocs/op BenchmarkDB_FullIteration_Get-8 85909 14108 ns/op 70881358 objects/sec 320 B/op 9 allocs/op BenchmarkDB_PropagationDelay-8 414364 2914 ns/op 26.00 50th_µs 30.00 90th_µs 69.00 99th_µs 1406 B/op 28 allocs/op PASS ok github.com/cilium/statedb 36.180s In summary: - Half the number of allocations in a transaction of 100 inserts - Almost double the throughput: 497k/s -> 873k/s - Replacing an existing object: 216k/s -> 369k/s - Propagation latency down from 41us to 26us for 50th percentile Signed-off-by: Jussi Maki <[email protected]>

With the move to the part library we can drop all references to go-immutable-radix. Signed-off-by: Jussi Maki <[email protected]>

Signed-off-by: Jussi Maki <[email protected]>

Have a proper leaf node instead of storing the leaf inside an empty node4. This will reduce memory usage and takes out an allocation and a pointer dereference. For 1M dummy objects the memory usage is ~15% less and the insert speed is ~15% faster, with 30% less allocations. The leaf node is still wasting some space as it shares the header and the 'leaf' and 'prefix' fields could be removed. Before: Benchmark_Insert_RootOnlyWatch-8 4873 213619 ns/op 4681237 objects/sec 148882 B/op 3076 allocs/op Benchmark_Insert-8 4107 267379 ns/op 3740016 objects/sec 249387 B/op 4114 allocs/op Inserting batch 1000/1000 ... Waiting for reconciliation to finish ... 1000000 objects reconciled in 3.10 seconds (batch size 1000) Throughput 322570.39 objects per second Allocated 8012221 objects, 471726kB bytes, 527128kB bytes still in use After: Benchmark_Insert_RootOnlyWatch-8 5898 187664 ns/op 5328686 objects/sec 116722 B/op 2071 allocs/op Benchmark_Insert-8 4978 257483 ns/op 3883746 objects/sec 217225 B/op 3109 allocs/op benchmark % go run . -objects 1000000 Inserting batch 1000/1000 ... Waiting for reconciliation to finish ... 1000000 objects reconciled in 2.73 seconds (batch size 1000) Throughput 365990.20 objects per second Allocated 6011516 objects, 409163kB bytes, 484024kB bytes still in use Signed-off-by: Jussi Maki <[email protected]>

To avoid having the useless 'leaf' field in the leaf struct, move it out from header into each individual node struct. We still have the 'prefix' in the leaf struct that could in principle be removed by special-casing, e.g. instead of always looking at the prefix one would first check if the node is a leaf and then do full key comparison instead. Leaving that for a later commit. Signed-off-by: Jussi Maki <[email protected]>

Additionally move some exported functions to the top of the file to make it easier for readers. Embed Tree[T] into Txn[T] to avoid repeating the same fields. Signed-off-by: Jussi Maki <[email protected]>

lowerbound() was sufficiently large, so just move the closure up to its own top-level function.

The leaf's watch channel wasn't closed on delete. Fix this and add a test case to check that all watch channels of deleted nodes are closed.

pippolo84

Focused my review on part and looks good to me. 🚀
I'll take another look at the benchmarks and tests later, but don't need to block this further.

joamaki force-pushed the pr/joamaki/part branch from b1d9082 to af8bcbe Compare April 11, 2024 09:45

joamaki changed the title ~~[DRAFT]: Persistent adaptive radix trees~~ Switch to persistent adaptive radix trees Apr 11, 2024

joamaki marked this pull request as ready for review April 11, 2024 09:48

joamaki requested review from bimmlerd, pippolo84 and dylandreimerink April 11, 2024 09:48

joamaki force-pushed the pr/joamaki/part branch 6 times, most recently from 07c06be to b9ae279 Compare April 15, 2024 14:40

dylandreimerink reviewed Apr 15, 2024

View reviewed changes

part/tree.go Outdated Show resolved Hide resolved

part/node.go Outdated Show resolved Hide resolved

part/node.go Outdated Show resolved Hide resolved

benchmarks_test.go Outdated Show resolved Hide resolved

benchmarks_test.go Outdated Show resolved Hide resolved

joamaki force-pushed the pr/joamaki/part branch 3 times, most recently from d399c2d to fd6eb05 Compare April 15, 2024 15:12

bimmlerd reviewed Apr 16, 2024

View reviewed changes

bimmlerd approved these changes Apr 16, 2024

View reviewed changes

joamaki force-pushed the pr/joamaki/part branch 3 times, most recently from 54dcc41 to ac5cf9f Compare April 16, 2024 12:46

joamaki requested review from dylandreimerink and bimmlerd April 16, 2024 12:48

joamaki and others added 4 commits April 16, 2024 14:54

reconciler: Fix benchmark memory stats

73534c2

runtime.GC wasn't called and the memstats were read after Stop(). Signed-off-by: Jussi Maki <[email protected]>

part: Add tests to cover a few edge cases

69265d3

Code coverage reports showed that there are a few edge cases that are not covered. This commit adds a few tests to cover some of those. Signed-off-by: Dylan Reimerink <[email protected]>

Cleanup WriteTxn benchmarks and add part baseline benchmarks

2b159c9

Deduplicate WriteTxn benchmark code and add tests for larger transaction sizes. Signed-off-by: Jussi Maki <[email protected]>

joamaki added 7 commits April 16, 2024 14:54

Drop dependency to go-immutable-radix

9964d8e

With the move to the part library we can drop all references to go-immutable-radix. Signed-off-by: Jussi Maki <[email protected]>

fuzz: Add validation of the action log before DeleteAll

bed33a3

Signed-off-by: Jussi Maki <[email protected]>

part: Add documentation

7b47698

Additionally move some exported functions to the top of the file to make it easier for readers. Embed Tree[T] into Txn[T] to avoid repeating the same fields. Signed-off-by: Jussi Maki <[email protected]>

part: Move traverseToMin into a top-level function

305b96d

lowerbound() was sufficiently large, so just move the closure up to its own top-level function.

joamaki force-pushed the pr/joamaki/part branch from ac5cf9f to 305b96d Compare April 16, 2024 12:54

part: Fix missing closing of leaf watch channel

2d59c00

The leaf's watch channel wasn't closed on delete. Fix this and add a test case to check that all watch channels of deleted nodes are closed.

pippolo84 approved these changes Apr 16, 2024

View reviewed changes

joamaki merged commit c7b7b15 into main Apr 17, 2024
1 check passed

joamaki deleted the pr/joamaki/part branch April 17, 2024 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to persistent adaptive radix trees #9

Switch to persistent adaptive radix trees #9

joamaki commented Apr 4, 2024 •

edited

Loading

joamaki commented Apr 9, 2024 •

edited

Loading

joamaki commented Apr 11, 2024

dylandreimerink left a comment

joamaki commented Apr 15, 2024

bimmlerd left a comment

bimmlerd Apr 16, 2024

joamaki Apr 16, 2024

bimmlerd Apr 16, 2024

joamaki Apr 16, 2024

bimmlerd Apr 16, 2024

joamaki Apr 16, 2024

joamaki Apr 16, 2024

bimmlerd Apr 16, 2024

bimmlerd Apr 16, 2024

joamaki Apr 16, 2024

bimmlerd Apr 16, 2024

joamaki Apr 16, 2024

bimmlerd Apr 16, 2024

joamaki Apr 16, 2024

bimmlerd left a comment

bimmlerd Apr 16, 2024

joamaki Apr 16, 2024

bimmlerd Apr 16, 2024

joamaki Apr 16, 2024

bimmlerd Apr 16, 2024

joamaki Apr 16, 2024

joamaki Apr 16, 2024

pippolo84 left a comment

Switch to persistent adaptive radix trees #9

Switch to persistent adaptive radix trees #9

Conversation

joamaki commented Apr 4, 2024 • edited Loading

joamaki commented Apr 9, 2024 • edited Loading

joamaki commented Apr 11, 2024

dylandreimerink left a comment

Choose a reason for hiding this comment

joamaki commented Apr 15, 2024

bimmlerd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bimmlerd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pippolo84 left a comment

Choose a reason for hiding this comment

joamaki commented Apr 4, 2024 •

edited

Loading

joamaki commented Apr 9, 2024 •

edited

Loading