-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize indexer package #219
Comments
So I'm thinking an approach something like the following might work: The indexers' There should be a
If we're currently less than Then the index process also has multiple workers which reads from a buffered channel:
Then finally on start it will need to figure which blocks if any have not been indexed and index them. I believe currently it just checks if Also in this architecture we'd need to consider how Maybe the goroutine reading jobs off the queue checks if the job is a disconnect. If so then it waits until all connects finish and then runs the disconnects serially. |
maybe adding the indexed blocks hashes to a bloom filter can help make this check fast and using very little memory. |
The indexer package is extremely inefficient and is responsible for most of the time spent syncing the chain.
Benchmarks show that around 90-95% of the time syncing the chain is due to two processes ― loading the utxos from disk to validate them, and running the indexer.
Of those two the indexer takes about twice the amount of time as loading the utxos. And the utxo cost can potentially be dramatically reduced by setting
--utxocachemaxsize
to hold the entire utxo set in memory.This would leave about 90% of the sync time exclusively due to the indexer (which you obviously wouldn't pay if you turn off all indexes).
Refactoring this will take some clever engineering. Right now blocks are passed in to the indexer as they are processed (and in the same db transaction). This guarantees that the indexer tracks the tip of the chain and is always caught up with the best chain.
It seems like an optimization might be to have the indexer have access to the block index and use the index in separate goroutines to load blocks and index them. However, the challenge is this might be even slower if it needs to load blocks from disk rather than having them passed in from memory. But the separate goroutine approach may allow each indexer to be run in parallel rather than serially.
If you're a strong Go dev who would like to tackle this optimization it would be greatly appreciated.
The text was updated successfully, but these errors were encountered: