refactor: eliminate txn overhead when not using param variables #120

audunhalland · 2025-02-19T12:07:44Z

After #119 was merged I had this idea on how to completely eliminate the memory usage overhead when not using statement outputs.

We now already branch on column_count > 0, it's clear that it's only those kinds of "observable" statements that need to be cached. So I changed the data structure from Vec to HashTable<usize, ObservableStatement>. That also means that the Cow<'static, str> that holds the input SQL does not need to have its lifetime extended when it's not returning any columns!

sebadob · 2025-02-19T12:15:17Z

So it's basically trading a bit of memory for a bit of compute because of hashing. I am not sure if this makes much of a difference memory wise. It will probably reduce performance a little bit, because we don't know the Vec size in advance and a starting with an empty one instead of with_capacity(). I was already thinking about creating the Vec for the transactions once outside the loop, allocate memory once and always clear it after a transaction. This would probably use a few bytes more memory for increased speed.

Have you dont testing here? How much compute and syscalls (increasing Vec size dynamically) you trade in for gaining a bit of memory?

edit:

When we start optimizing performance, it makes a lot of sense to introduce smallvec in lots of places.

audunhalland · 2025-02-19T12:37:32Z

The thinking is "don't pay for what you don't use". I'm thinking that in a transaction consisting of n statements, only a fraction of the statements will realistically return any columns. A majority will likely be "leaf statements", like in the example that inserts two parents and three children. So I think it makes sense to pay the "cost" of a dynamically growing HashTable (likely negligible in the big picture) only when using outputs and referencing. The memory usage and reallocations of the HashTable will be made up for by the SQL strings that can be discarded earlier (my understanding is that these are Cow::Owned on hiqlite nodes where the transaction did not originate)

I've not done any testing or perf measurement, it's only based on intuition of cost. If the HashTable is considered expensive because of the default hashing algorithm, there's always fnv which I tend to use for numeric keys when I'm in control of the key space.

sebadob · 2025-02-19T13:03:58Z

my understanding is that these are Cow::Owned on hiqlite nodes where the transaction did not originate

Right.

Usually Vecs are faster for smaller sets and maybe fnv is a good option here, since it should be a drop-in replacement, right? I never used it before, but it looks like that from the docs.

Another approach would be to just use a Vec<(usize, ObservableStatement)> and just iterate until you find the key, which is probably a lot faster since there will most likely be very few statements anyway. But all of this will have a small impact only, I guess. A more in depth performance optimization can be done at a later point as well.
I also need to clean up some code like the testing stuff for redb and such.

So, I could merge it like that, if you like, and we could look into perf opt later.

sebadob · 2025-02-20T08:40:33Z

Thank you very much!

I am pretty critical about performance in the writer because it will always be the bottleneck of the whole system.

refactor: eliminate txn overhead when not using param variables

88aa9d5

audunhalland force-pushed the eliminate-txn-overhead branch from cb94d1f to 88aa9d5 Compare February 19, 2025 23:14

sebadob merged commit 61fab1a into sebadob:main Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: eliminate txn overhead when not using param variables #120

refactor: eliminate txn overhead when not using param variables #120

audunhalland commented Feb 19, 2025

sebadob commented Feb 19, 2025 •

edited

Loading

audunhalland commented Feb 19, 2025

sebadob commented Feb 19, 2025 •

edited

Loading

sebadob commented Feb 20, 2025

refactor: eliminate txn overhead when not using param variables #120

refactor: eliminate txn overhead when not using param variables #120

Conversation

audunhalland commented Feb 19, 2025

sebadob commented Feb 19, 2025 • edited Loading

audunhalland commented Feb 19, 2025

sebadob commented Feb 19, 2025 • edited Loading

sebadob commented Feb 20, 2025

sebadob commented Feb 19, 2025 •

edited

Loading

sebadob commented Feb 19, 2025 •

edited

Loading