Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: eliminate txn overhead when not using param variables #120

Merged
merged 1 commit into from
Feb 20, 2025

Conversation

audunhalland
Copy link
Contributor

After #119 was merged I had this idea on how to completely eliminate the memory usage overhead when not using statement outputs.

We now already branch on column_count > 0, it's clear that it's only those kinds of "observable" statements that need to be cached. So I changed the data structure from Vec to HashTable<usize, ObservableStatement>. That also means that the Cow<'static, str> that holds the input SQL does not need to have its lifetime extended when it's not returning any columns!

@sebadob
Copy link
Owner

sebadob commented Feb 19, 2025

So it's basically trading a bit of memory for a bit of compute because of hashing. I am not sure if this makes much of a difference memory wise. It will probably reduce performance a little bit, because we don't know the Vec size in advance and a starting with an empty one instead of with_capacity(). I was already thinking about creating the Vec for the transactions once outside the loop, allocate memory once and always clear it after a transaction. This would probably use a few bytes more memory for increased speed.

Have you dont testing here? How much compute and syscalls (increasing Vec size dynamically) you trade in for gaining a bit of memory?

edit:

When we start optimizing performance, it makes a lot of sense to introduce smallvec in lots of places.

@audunhalland
Copy link
Contributor Author

The thinking is "don't pay for what you don't use". I'm thinking that in a transaction consisting of n statements, only a fraction of the statements will realistically return any columns. A majority will likely be "leaf statements", like in the example that inserts two parents and three children. So I think it makes sense to pay the "cost" of a dynamically growing HashTable (likely negligible in the big picture) only when using outputs and referencing. The memory usage and reallocations of the HashTable will be made up for by the SQL strings that can be discarded earlier (my understanding is that these are Cow::Owned on hiqlite nodes where the transaction did not originate)

I've not done any testing or perf measurement, it's only based on intuition of cost. If the HashTable is considered expensive because of the default hashing algorithm, there's always fnv which I tend to use for numeric keys when I'm in control of the key space.

@sebadob
Copy link
Owner

sebadob commented Feb 19, 2025

my understanding is that these are Cow::Owned on hiqlite nodes where the transaction did not originate

Right.

Usually Vecs are faster for smaller sets and maybe fnv is a good option here, since it should be a drop-in replacement, right? I never used it before, but it looks like that from the docs.

Another approach would be to just use a Vec<(usize, ObservableStatement)> and just iterate until you find the key, which is probably a lot faster since there will most likely be very few statements anyway. But all of this will have a small impact only, I guess. A more in depth performance optimization can be done at a later point as well.
I also need to clean up some code like the testing stuff for redb and such.

So, I could merge it like that, if you like, and we could look into perf opt later.

@audunhalland audunhalland force-pushed the eliminate-txn-overhead branch from cb94d1f to 88aa9d5 Compare February 19, 2025 23:14
@sebadob
Copy link
Owner

sebadob commented Feb 20, 2025

Thank you very much!

I am pretty critical about performance in the writer because it will always be the bottleneck of the whole system.

@sebadob sebadob merged commit 61fab1a into sebadob:main Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants