Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to persistent adaptive radix trees #9

Merged
merged 12 commits into from
Apr 17, 2024
Merged

Switch to persistent adaptive radix trees #9

merged 12 commits into from
Apr 17, 2024

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Apr 4, 2024

Switch StateDB to use custom implementation of Persistent Adaptive Radix Trees (https://db.in.tum.de/~leis/papers/ART.pdf).

The implementation isn't a completely fateful implementation of the paper. The Node16 SIMD tricks and Node48 key array are skipped. This made the implementation simpler, but did leave some performance improvements on the table.

The only functional change to StateDB is to drop the Last and LastWatch methods since that requires an reverse iterator implementation and since we didn't use them we can just drop them for now.

Benchmark before:

goos: linux                                                                                                                                                                                     
goarch: amd64                                                                                                                                                                                   
pkg: github.com/cilium/statedb                                                                                                                                                                  
cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz                                                                                                                                                  
BenchmarkDB_WriteTxn_1-8                                          307952              4012 ns/op            249227 objects/sec      3080 B/op         53 allocs/op                              
BenchmarkDB_WriteTxn_10-8                                         504811              2373 ns/op            421360 objects/sec      1501 B/op         23 allocs/op                              
BenchmarkDB_WriteTxn_100-8                                        522646              2229 ns/op            448544 objects/sec      1399 B/op         19 allocs/op                              
BenchmarkDB_WriteTxn_1000-8                                       369973              2954 ns/op            338547 objects/sec      1492 B/op         18 allocs/op                              
BenchmarkDB_WriteTxn_10000-8                                      995700              2792 ns/op            358210 objects/sec      1355 B/op         21 allocs/op                              
BenchmarkDB_WriteTxn_100_SecondaryIndex-8                         293709              3843 ns/op            260247 objects/sec      2208 B/op         33 allocs/op                              
BenchmarkDB_RandomInsert-8                                          1032           1192822 ns/op            838348 objects/sec    796015 B/op      12214 allocs/op                              
BenchmarkDB_RandomReplace-8                                          243           4906222 ns/op            203823 objects/sec   2382866 B/op      33380 allocs/op                              
BenchmarkDB_SequentialInsert-8                                       430           2770646 ns/op            360927 objects/sec   1492345 B/op      18256 allocs/op                              
BenchmarkDB_Baseline_SingleRadix_Insert-8                           3109            371203 ns/op           2693945 objects/sec    353662 B/op       5103 allocs/op                              
BenchmarkDB_Baseline_SingleRadix_TrackMutate_Insert-8               3105            373685 ns/op           2676056 objects/sec    353951 B/op       5106 allocs/op                              
BenchmarkDB_Baseline_SingleRadix_Lookup-8                          13185             90097 ns/op          11099219 objects/sec         0 B/op          0 allocs/op                              
BenchmarkDB_Baseline_Part_RootOnlyWatch_Insert-8                    5473            204824 ns/op           4882248 objects/sec    146969 B/op       3065 allocs/op                              
BenchmarkDB_Baseline_Part_Insert-8                                  4282            257142 ns/op           3888905 objects/sec    246580 B/op       4094 allocs/op                              
BenchmarkDB_Baseline_Part_Lookup-8                                 36069             33324 ns/op          30008204 objects/sec         0 B/op          0 allocs/op                              
BenchmarkDB_Baseline_Hashmap_Insert-8                              15986             74292 ns/op          13460333 objects/sec     86561 B/op         64 allocs/op                              
BenchmarkDB_Baseline_Hashmap_Lookup-8                              84038             14228 ns/op          70282379 objects/sec         0 B/op          0 allocs/op                              
BenchmarkDB_DeleteTracker_Baseline-8                                 385           3065639 ns/op            326197 objects/sec   1758394 B/op      23413 allocs/op                              
BenchmarkDB_DeleteTracker-8                                          192           6763615 ns/op            147850 objects/sec   3387030 B/op      40807 allocs/op                              
BenchmarkDB_RandomLookup-8                                          7114            153828 ns/op           6500774 objects/sec       144 B/op          1 allocs/op                              
BenchmarkDB_SequentialLookup-8                                      8168            142537 ns/op           7015728 objects/sec      8000 B/op       1000 allocs/op                              
BenchmarkDB_FullIteration_All-8                                   100930             11497 ns/op          86979024 objects/sec       280 B/op          6 allocs/op                              
BenchmarkDB_FullIteration_Get-8                                    81974             14355 ns/op          69661408 objects/sec       320 B/op          7 allocs/op
BenchmarkDB_PropagationDelay-8                                    243086              4838 ns/op                43.00 50th_µs           50.00 90th_µs          212.0 99th_µs        3083 B/op        50 allocs/op

Benchmark after:

BenchmarkDB_WriteTxn_1-8                                  380911              3093 ns/op            323322 objects/sec      2440 B/op         38 allocs/op
BenchmarkDB_WriteTxn_10-8                                 901350              1336 ns/op            748741 objects/sec       699 B/op         12 allocs/op
BenchmarkDB_WriteTxn_100-8                               1000000              1075 ns/op            930477 objects/sec       581 B/op          9 allocs/op
BenchmarkDB_WriteTxn_1000-8                              1000000              1262 ns/op            792304 objects/sec       578 B/op          9 allocs/op
BenchmarkDB_WriteTxn_10000-8                             1000000              1262 ns/op            792171 objects/sec       548 B/op          9 allocs/op
BenchmarkDB_WriteTxn_100_SecondaryIndex-8                 705823              1752 ns/op            570791 objects/sec       934 B/op         18 allocs/op
BenchmarkDB_RandomInsert-8                                  1563            743594 ns/op           1344819 objects/sec    481777 B/op       9162 allocs/op
BenchmarkDB_RandomReplace-8                                  530           2266878 ns/op            441135 objects/sec    932101 B/op      19170 allocs/op
BenchmarkDB_SequentialInsert-8                               970           1264837 ns/op            790616 objects/sec    578330 B/op       9168 allocs/op
BenchmarkDB_Baseline_Part_RootOnlyWatch_Insert-8            5121            220673 ns/op           4531599 objects/sec    146970 B/op       3065 allocs/op
BenchmarkDB_Baseline_Part_Insert-8                          4135            281535 ns/op           3551957 objects/sec    246583 B/op       4094 allocs/op
BenchmarkDB_Baseline_Part_Lookup-8                         35305             34346 ns/op          29115518 objects/sec         0 B/op          0 allocs/op
BenchmarkDB_Baseline_Hashmap_Insert-8                      14404             83092 ns/op          12034837 objects/sec     86549 B/op         64 allocs/op
BenchmarkDB_Baseline_Hashmap_Lookup-8                      77744             15222 ns/op          65694567 objects/sec         0 B/op          0 allocs/op
BenchmarkDB_DeleteTracker_Baseline-8                         945           1267367 ns/op            789038 objects/sec    648963 B/op      13062 allocs/op
BenchmarkDB_DeleteTracker-8                                  501           2361545 ns/op            423452 objects/sec   1176999 B/op      20188 allocs/op
BenchmarkDB_RandomLookup-8                                 13999             82787 ns/op          12079196 objects/sec       144 B/op          1 allocs/op
BenchmarkDB_SequentialLookup-8                             14426             77552 ns/op          12894616 objects/sec      8000 B/op       1000 allocs/op
BenchmarkDB_FullIteration_All-8                            86900             12131 ns/op          82435475 objects/sec       264 B/op          6 allocs/op
BenchmarkDB_FullIteration_Get-8                            75100             14046 ns/op          71195485 objects/sec       304 B/op          7 allocs/op
BenchmarkDB_PropagationDelay-8                            398498              3135 ns/op                27.00 50th_µs           32.00 90th_µs          170.0 99th_µs        1479 B/op         28 allocs/op

Highlights:

  • WriteTxn_1 went from 250k/s to 323k/s
  • WriteTxn_100 went from 448k/s to 930k/s
  • RandomLookup from 700k/s to 1208k/s
  • DeleteTracker up from 150k/s to 423k/s
  • Propagation delay down from 43uS to 27uS (50th)
  • Reconciler throughput up from ~200k/s to 350k/s, memory usage down 50%

@joamaki
Copy link
Contributor Author

joamaki commented Apr 9, 2024

Benchmark results vs immutable-radix:

BenchmarkDB_Baseline_Part_Insert-8                      	   5067	   217217 ns/op	  4603701 objects/sec	 107524 B/op	   2776 allocs/op
BenchmarkDB_Baseline_SingleRadix_Insert-8               	   3138	   381252 ns/op	  2622943 objects/sec	 353661 B/op	   5103 allocs/op
BenchmarkDB_Baseline_SingleRadix_TrackMutate_Insert-8   	   2782	   396346 ns/op	  2523049 objects/sec	 353950 B/op	   5106 allocs/op
BenchmarkDB_Baseline_Part_TrackMutate_Insert-8          	   4213	   261964 ns/op	  3817327 objects/sec	 208022 B/op	   3821 allocs/op
BenchmarkDB_Baseline_SingleRadix_Lookup-8               	  10000	   101206 ns/op	  9880891 objects/sec	      0 B/op	      0 allocs/op
BenchmarkDB_Baseline_Part_Lookup-8                      	  10000	   104054 ns/op	  9610405 objects/sec	      0 B/op	      0 allocs/op
BenchmarkDB_Baseline_Hashmap_Insert-8                   	  15531	    77728 ns/op	 12865460 objects/sec	  86551 B/op	     64 allocs/op
BenchmarkDB_Baseline_Hashmap_Lookup-8                   	  79976	    15558 ns/op	 64277520 objects/sec	      0 B/op	      0 allocs/op

"Part_Insert" is without the watch channel management, and "Part_TrackMutate_Insert" is with. "SingleRadix_TrackMutate" still creates the watch channels but doesn't close them (so it's not apples-to-apples).

This definitely shows that it's worth going this direction. It's almost twice as fast and half the allocations. We can then for the revision index just have a single watch channel per table rather than per node.

There's not that much missing now. Need support for Len() and probably want to use generics to allow embedding the "object" struct into the "node*" struct and avoid an allocation.

@joamaki joamaki changed the title [DRAFT]: Persistent adaptive radix trees Switch to persistent adaptive radix trees Apr 11, 2024
@joamaki joamaki marked this pull request as ready for review April 11, 2024 09:48
@joamaki
Copy link
Contributor Author

joamaki commented Apr 11, 2024

The impact of this is really visible in the reconciler benchmark:

benchmark % go run . -batchsize 100                                                                                                                                    (pr/joamaki/part)statedb
Inserting batch 1000/1000 ...
Waiting for reconciliation to finish ...

100000 objects reconciled in 0.48 seconds (batch size 100)
Throughput 206555.64 objects per second
Allocated 1423417 objects, 115971kB bytes, 117728kB bytes still in use
benchmark % stg push
Pushing patch "statedb-switch-to-part" ... done (unmodified)
Now at patch "statedb-switch-to-part"
benchmark % go run . -batchsize 100
Inserting batch 1000/1000 ...
Waiting for reconciliation to finish ...

100000 objects reconciled in 0.34 seconds (batch size 100)
Throughput 297510.35 objects per second
Allocated 983554 objects, 57910kB bytes, 60208kB bytes still in use

It's 50% faster and we're using almost half the amount of memory for the objects.

@joamaki joamaki force-pushed the pr/joamaki/part branch 6 times, most recently from 07c06be to b9ae279 Compare April 15, 2024 14:40
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few small thing. In general things seem to check out. However, the part/iterator.go and part/txn.go have some very dense code which I find difficult to reason through. That is partially due to also having to wrap my head around the ART implementation which is new to me.

There are a few edge cases in there which are not covered by the current tests, so I would like to schedule a codewalk to go through it together before my final verdict.

part/tree.go Outdated Show resolved Hide resolved
part/node.go Outdated Show resolved Hide resolved
part/node.go Outdated Show resolved Hide resolved
benchmarks_test.go Outdated Show resolved Hide resolved
benchmarks_test.go Outdated Show resolved Hide resolved
@joamaki
Copy link
Contributor Author

joamaki commented Apr 15, 2024

Found a few small thing. In general things seem to check out. However, the part/iterator.go and part/txn.go have some very dense code which I find difficult to reason through. That is partially due to also having to wrap my head around the ART implementation which is new to me.

There are a few edge cases in there which are not covered by the current tests, so I would like to schedule a codewalk to go through it together before my final verdict.

Thanks for the review! What edge cases are not covered? I think it'd be great to have someone else besides me write a few test cases for this and try to break it...

@joamaki joamaki force-pushed the pr/joamaki/part branch 3 times, most recently from d399c2d to fd6eb05 Compare April 15, 2024 15:12
Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part looking good so far - need to further review iterator and the tests, and then the integration, but it's hard work 😅

// SPDX-License-Identifier: Apache-2.0
// Copyright Authors of Cilium

package part
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of part seems to be undocumented from the user perspective. If you could add a godoc to func NewTree, type Tree and type Txn that would already helpquite a bit, unless you think it's not super useful to document part as a free standing lib.

Bonus for some simple examples (or let me know and I'll try to write them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I'll do a pass to add docs to exported types & funcs.

part/txn.go Outdated
type Txn[T any] struct {
opts *options
root *node[T]
size int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the size of a transaction track? Is it the number of operations (i.e. inserts/deletes) or is it an optimization keeping track of the size of the resulting tree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number of objects in the tree for implementing Len(). I'll document. the fields.

}

func (txn *Txn[T]) Commit() *Tree[T] {
txn.mutated = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could clear here, but I'm not sure what the differences are - do we rely on len(map) == 0 or on map == nil somewhere? I'd imagine clear being potentially more efficient by avoiding GC'ing the map memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting. I'll check if clear is better from GC perspective!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goos: linux
goarch: amd64
pkg: github.com/cilium/statedb/part
cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
Benchmark_txn_1-8        1608686               775.2 ns/op         1290049 objects/sec
PASS
ok      github.com/cilium/statedb/part  2.275s
part % go test . -bench txn_1\$                                                     (pr/joamaki/part)statedb
goos: linux
goarch: amd64
pkg: github.com/cilium/statedb/part
cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
Benchmark_txn_1-8        1767748               643.4 ns/op         1554270 objects/sec
PASS

First one is with clear, second without. So clearly worse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearly worse 😁 anyway - good to know!

part/txn.go Outdated
// Reuse the same slice in the transaction to hold the parents in order to avoid
// allocations.
if txn.deleteParents == nil {
txn.deleteParents = make([]deleteParent[T], 0, 32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of 32, should this instead depend on the (estimated) height of the tree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do anything fancy like that. I just picked a number that's high enough to be most of the time big enough. I can move this into a constant and add some comment about why this number.

part/txn.go Outdated Show resolved Hide resolved
part/node.go Outdated
return nCopy
}

func (n *node[T]) promote(watch bool) *node[T] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putting this here for lack of a better place. The paper has specialised the structure of each inner node, whereas your implementation "only" specialises node256 - I'm guessing you're leaving room for further optimization to keep some simplicity? Or did you benchmark and the other structures didn't do much good?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for simplicity so I didn't even initially specialize node256 (until I benchmarked and noticed it makes a ton of sense). Since the perf is already 2x to what we had before I would leave it at this for now.

part/node.go Outdated Show resolved Hide resolved
case nodeKind48:
return n.node48().children[0:n.size():48]
case nodeKind256:
return n.node256().children[:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the [:]? Could also be left out or [:256] I guess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need [:] to convert it to a slice. [:256] would be the same as [:].

part/iterator.go Outdated Show resolved Hide resolved
part/iterator.go Outdated Show resolved Hide resolved
Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending a few nits and questions, but I think this is good enough as is already.

part/part_test.go Show resolved Hide resolved

txn = tree.Txn()
for _, i := range keys {
_, hadOld = txn.Insert(intKey(i), i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inject another get which confirms they're gone also after committing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

func Test_iterate(t *testing.T) {
sizes := []int{0, 1, 10, 100, 1000, rand.Intn(1000)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific behaviour we want for calling Next() on iterators which already finished?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added assert.

part/iterator.go Outdated
edges := [][]*node[T]{}

var traverseToMin func(n *node[T])
traverseToMin = func(n *node[T]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: definition and declaration separate, could be joined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it can't be since traverseToMin calls traverseToMin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lifted traverseToMin into a top-level function.

@joamaki joamaki force-pushed the pr/joamaki/part branch 3 times, most recently from 54dcc41 to ac5cf9f Compare April 16, 2024 12:46
joamaki and others added 4 commits April 16, 2024 14:54
runtime.GC wasn't called and the memstats were read after Stop().

Signed-off-by: Jussi Maki <[email protected]>
Add an implementation of Persistent Adaptive Radix Trees.

Adaptive Radix Trees are described in the paper
https://db.in.tum.de/~leis/papers/ART.pdf. Compared to the
go-immutable-radix which is non-adaptive the adaptive radix trees
use less memory and avoid an extra pointer chase as the children array
is embedded into the node. 

This implementation also allows only using watch channel at the root node
which is more efficient for the revision index where per-node watch channels
are not useful (we can't use them with LowerBound).

Signed-off-by: Jussi Maki <[email protected]>

Refresh of part-add-custom-implementation
Code coverage reports showed that there are a few edge cases that are
not covered. This commit adds a few tests to cover some of those.

Signed-off-by: Dylan Reimerink <[email protected]>
Deduplicate WriteTxn benchmark code and add tests for larger transaction
sizes.

Signed-off-by: Jussi Maki <[email protected]>
joamaki added 7 commits April 16, 2024 14:54
Use the part library instead of go-immutable-radix.
Since part doesn't have a reverse iterator drop the Last and LastWatch support
(they were never used, we can add them back if we need them).

Benchmark results before:

statedb % go test . -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/cilium/statedb
cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
BenchmarkDB_WriteTxn_1-8                                	 332079	     3431 ns/op	   291454 objects/sec	   3080 B/op	     53 allocs/op
BenchmarkDB_WriteTxn_10-8                               	 567682	     2026 ns/op	   493603 objects/sec	   1501 B/op	     23 allocs/op
BenchmarkDB_WriteTxn_100-8                              	 570948	     2010 ns/op	   497480 objects/sec	   1399 B/op	     19 allocs/op
BenchmarkDB_WriteTxn_1000-8                             	 469208	     2715 ns/op	   368260 objects/sec	   1495 B/op	     18 allocs/op
BenchmarkDB_WriteTxn_10000-8                            	1000000	     2609 ns/op	   383238 objects/sec	   1349 B/op	     21 allocs/op
BenchmarkDB_WriteTxn_100_SecondaryIndex-8               	 309526	     3554 ns/op	   281402 objects/sec	   2208 B/op	     33 allocs/op
BenchmarkDB_RandomInsert-8                              	   1108	  1010195 ns/op	   989908 objects/sec	 796015 B/op	  12214 allocs/op
BenchmarkDB_RandomReplace-8                             	    250	  4612771 ns/op	   216789 objects/sec	2383117 B/op	  33382 allocs/op
BenchmarkDB_SequentialInsert-8                          	    398	  2997096 ns/op	   333656 objects/sec	1492145 B/op	  18255 allocs/op
BenchmarkDB_Baseline_Part_Insert-8                      	   4711	   222810 ns/op	  4488140 objects/sec	 115342 B/op	   3071 allocs/op
BenchmarkDB_Baseline_SingleRadix_Insert-8               	   3504	   331964 ns/op	  3012375 objects/sec	 353663 B/op	   5103 allocs/op
BenchmarkDB_Baseline_SingleRadix_TrackMutate_Insert-8   	   3328	   333911 ns/op	  2994818 objects/sec	 353952 B/op	   5106 allocs/op
BenchmarkDB_Baseline_Part_TrackMutate_Insert-8          	   4189	   273483 ns/op	  3656536 objects/sec	 214634 B/op	   4096 allocs/op
BenchmarkDB_Baseline_SingleRadix_Lookup-8               	  13726	    84329 ns/op	 11858358 objects/sec	      0 B/op	      0 allocs/op
BenchmarkDB_Baseline_Part_Lookup-8                      	  10000	   101451 ns/op	  9857032 objects/sec	      0 B/op	      0 allocs/op
BenchmarkDB_Baseline_Hashmap_Insert-8                   	  17227	    70050 ns/op	 14275499 objects/sec	  86552 B/op	     64 allocs/op
BenchmarkDB_Baseline_Hashmap_Lookup-8                   	  81237	    14801 ns/op	 67561608 objects/sec	      0 B/op	      0 allocs/op
BenchmarkDB_DeleteTracker_Baseline-8                    	    416	  2930618 ns/op	   341225 objects/sec	1758387 B/op	  23413 allocs/op
BenchmarkDB_DeleteTracker-8                             	    194	  6178728 ns/op	   161846 objects/sec	3392362 B/op	  40843 allocs/op
BenchmarkDB_RandomLookup-8                              	   8728	   137090 ns/op	  7294481 objects/sec	    144 B/op	      1 allocs/op
BenchmarkDB_SequentialLookup-8                          	   9162	   128953 ns/op	  7754759 objects/sec	   8000 B/op	   1000 allocs/op
BenchmarkDB_FullIteration_All-8                         	 103431	    11386 ns/op	 87823494 objects/sec	    296 B/op	      8 allocs/op
BenchmarkDB_FullIteration_Get-8                         	  83902	    13900 ns/op	 71943487 objects/sec	    336 B/op	      9 allocs/op
BenchmarkDB_PropagationDelay-8                          	 240777	     4609 ns/op	       41.00 50th_µs	       48.00 90th_µs	      199.0 99th_µs	   3083 B/op	     50 allocs/op
PASS
ok  	github.com/cilium/statedb	38.595s

Benchmark results after:
statedb % go test . -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/cilium/statedb
cpu: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
BenchmarkDB_WriteTxn_1-8                                	 412078	     2748 ns/op	   363939 objects/sec	   2408 B/op	     38 allocs/op
BenchmarkDB_WriteTxn_10-8                               	 949206	     1267 ns/op	   789411 objects/sec	    663 B/op	     12 allocs/op
BenchmarkDB_WriteTxn_100-8                              	 995774	     1145 ns/op	   873221 objects/sec	    521 B/op	      9 allocs/op
BenchmarkDB_WriteTxn_1000-8                             	 968202	     1350 ns/op	   740537 objects/sec	    516 B/op	      9 allocs/op
BenchmarkDB_WriteTxn_10000-8                            	1000000	     1507 ns/op	   663678 objects/sec	    486 B/op	      9 allocs/op
BenchmarkDB_WriteTxn_100_SecondaryIndex-8               	 559314	     2031 ns/op	   492404 objects/sec	    844 B/op	     18 allocs/op
BenchmarkDB_RandomInsert-8                              	   1326	   841762 ns/op	  1187984 objects/sec	 418200 B/op	   9170 allocs/op
BenchmarkDB_RandomReplace-8                             	    438	  2705055 ns/op	   369678 objects/sec	 837668 B/op	  19188 allocs/op
BenchmarkDB_SequentialInsert-8                          	    759	  1636911 ns/op	   610907 objects/sec	 515916 B/op	   9185 allocs/op
BenchmarkDB_Baseline_Part_Insert-8                      	   5148	   213659 ns/op	  4680351 objects/sec	 115341 B/op	   3071 allocs/op
BenchmarkDB_Baseline_SingleRadix_Insert-8               	   3252	   345056 ns/op	  2898080 objects/sec	 353662 B/op	   5103 allocs/op
BenchmarkDB_Baseline_SingleRadix_TrackMutate_Insert-8   	   3238	   371801 ns/op	  2689613 objects/sec	 353951 B/op	   5106 allocs/op
BenchmarkDB_Baseline_Part_TrackMutate_Insert-8          	   3883	   292706 ns/op	  3416395 objects/sec	 214636 B/op	   4096 allocs/op
BenchmarkDB_Baseline_SingleRadix_Lookup-8               	  13491	    88791 ns/op	 11262375 objects/sec	      0 B/op	      0 allocs/op
BenchmarkDB_Baseline_Part_Lookup-8                      	  10000	   104666 ns/op	  9554213 objects/sec	      0 B/op	      0 allocs/op
BenchmarkDB_Baseline_Hashmap_Insert-8                   	  17108	    71518 ns/op	 13982445 objects/sec	  86539 B/op	     64 allocs/op
BenchmarkDB_Baseline_Hashmap_Lookup-8                   	  83188	    13999 ns/op	 71434361 objects/sec	      0 B/op	      0 allocs/op
BenchmarkDB_DeleteTracker_Baseline-8                    	    934	  1317762 ns/op	   758863 objects/sec	 525331 B/op	  12317 allocs/op
BenchmarkDB_DeleteTracker-8                             	    488	  2450865 ns/op	   408019 objects/sec	 925854 B/op	  18652 allocs/op
BenchmarkDB_RandomLookup-8                              	   6177	   165365 ns/op	  6047254 objects/sec	    144 B/op	      1 allocs/op
BenchmarkDB_SequentialLookup-8                          	   8071	   146586 ns/op	  6821924 objects/sec	   8000 B/op	   1000 allocs/op
BenchmarkDB_FullIteration_All-8                         	 103074	    11917 ns/op	 83911284 objects/sec	    280 B/op	      8 allocs/op
BenchmarkDB_FullIteration_Get-8                         	  85909	    14108 ns/op	 70881358 objects/sec	    320 B/op	      9 allocs/op
BenchmarkDB_PropagationDelay-8                          	 414364	     2914 ns/op	       26.00 50th_µs	       30.00 90th_µs	       69.00 99th_µs	   1406 B/op	     28 allocs/op
PASS
ok  	github.com/cilium/statedb	36.180s

In summary:
- Half the number of allocations in a transaction of 100 inserts
- Almost double the throughput: 497k/s -> 873k/s
- Replacing an existing object: 216k/s -> 369k/s
- Propagation latency down from 41us to 26us for 50th percentile

Signed-off-by: Jussi Maki <[email protected]>
With the move to the part library we can drop all references to
go-immutable-radix.

Signed-off-by: Jussi Maki <[email protected]>
Have a proper leaf node instead of storing the leaf inside an empty
node4. This will reduce memory usage and takes out an allocation and
a pointer dereference.

For 1M dummy objects the memory usage is ~15% less and the insert
speed is ~15% faster, with 30% less allocations.

The leaf node is still wasting some space as it shares the header
and the 'leaf' and 'prefix' fields could be removed.

Before:

Benchmark_Insert_RootOnlyWatch-8            4873            213619 ns/op           4681237 objects/sec    148882 B/op       3076 allocs/op
Benchmark_Insert-8                          4107            267379 ns/op           3740016 objects/sec    249387 B/op       4114 allocs/op

Inserting batch 1000/1000 ...
Waiting for reconciliation to finish ...

1000000 objects reconciled in 3.10 seconds (batch size 1000)
Throughput 322570.39 objects per second
Allocated 8012221 objects, 471726kB bytes, 527128kB bytes still in use

After:

Benchmark_Insert_RootOnlyWatch-8            5898            187664 ns/op           5328686 objects/sec    116722 B/op       2071 allocs/op
Benchmark_Insert-8                          4978            257483 ns/op           3883746 objects/sec    217225 B/op       3109 allocs/op

benchmark % go run . -objects 1000000
Inserting batch 1000/1000 ...
Waiting for reconciliation to finish ...

1000000 objects reconciled in 2.73 seconds (batch size 1000)
Throughput 365990.20 objects per second
Allocated 6011516 objects, 409163kB bytes, 484024kB bytes still in use

Signed-off-by: Jussi Maki <[email protected]>
To avoid having the useless 'leaf' field in the leaf struct,
move it out from header into each individual node struct.

We still have the 'prefix' in the leaf struct that could in
principle be removed by special-casing, e.g. instead of
always looking at the prefix one would first check if the
node is a leaf and then do full key comparison instead.
Leaving that for a later commit.

Signed-off-by: Jussi Maki <[email protected]>
Additionally move some exported functions to the top of the file to make
it easier for readers.

Embed Tree[T] into Txn[T] to avoid repeating the same fields.

Signed-off-by: Jussi Maki <[email protected]>
lowerbound() was sufficiently large, so just move the closure up
to its own top-level function.
The leaf's watch channel wasn't closed on delete.
Fix this and add a test case to check that all watch channels
of deleted nodes are closed.
Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Focused my review on part and looks good to me. 🚀
I'll take another look at the benchmarks and tests later, but don't need to block this further.

@joamaki joamaki merged commit c7b7b15 into main Apr 17, 2024
1 check passed
@joamaki joamaki deleted the pr/joamaki/part branch April 17, 2024 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants