-
Notifications
You must be signed in to change notification settings - Fork 1.9k
perf(trie): proofmanager optimisation WIP #18829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d3e816b
to
ed56919
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements worker pooling optimization for the ProofTaskManager to reduce overhead from task creation and simplify channel design. The optimization introduces separate worker pools for storage and account proofs, using long-lived database transactions instead of creating new ones for each request.
- Adds storage and account worker pools with persistent database transactions
- Refactors channel architecture from mpsc to crossbeam channels for better performance
- Introduces new metrics for tracking worker pool efficiency and queue depths
Reviewed Changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
File | Description |
---|---|
crates/trie/parallel/src/stats.rs | Adds new metrics for tracking storage proof immediate vs blocked operations |
crates/trie/parallel/src/proof_task_metrics.rs | Extends metrics with queue depth gauges and wait time histograms |
crates/trie/parallel/src/proof_task.rs | Core refactor implementing worker pools, new job types, and crossbeam channels |
crates/trie/parallel/src/proof.rs | Updates proof building to use new channel types and on-demand storage fetching |
crates/trie/parallel/Cargo.toml | Adds crossbeam-channel dependency |
crates/engine/tree/src/tree/payload_validator.rs | Propagates error from ProofTaskManager::new |
crates/engine/tree/src/tree/payload_processor/multiproof.rs | Updates to use dual manager architecture |
crates/engine/tree/src/tree/payload_processor/mod.rs | Creates separate storage and account proof managers |
crates/engine/tree/benches/state_root_task.rs | Updates benchmark to handle new error handling |
crates/engine/primitives/src/config.rs | Adds configuration for storage and account worker counts |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
do we even need this type and queuing in general if we could wire if we use multi receiver (1 per worker) then there’s no need to do manual queueing unless we can optimize chunking or some other preprocessing /// Sender to the account proof task manager.
account_proof_task_handle: ProofTaskManagerHandle<FactoryTx<Factory>>,
/// Sender to the storage proof task manager.
storage_proof_task_handle: ProofTaskManagerHandle<FactoryTx<Factory>>,
in pub struct MultiproofManager<Factory: DatabaseProviderFactory> { to /// Channel for dispatching storage proof work to workers
storage_work_tx: crossbeam_channel::Sender<ProofJob>,
/// Channel for dispatching account multiproof work to workers
account_work_tx: crossbeam_channel::Sender<AccountProofJob<FactoryTx<Factory>>>,
from pub struct ProofTaskManager<Factory: DatabaseProviderFactory> {
|
927ad6a
to
9b121c0
Compare
- add storage proof workers configuration and integrate into proof task manager
- Introduced `AccountMultiproofInput` structure to handle account multiproof tasks. - Added `execute_account_multiproof_worker` function to manage multiproof execution. - Updated `ProofTaskManager` to support account multiproof workers and task dispatching. - Refactored proof building logic to utilize pre-computed storage proofs for account multiproof generation.
- Added `account_proof_task_handle` to `MultiproofManager` for managing account multiproof tasks. - Updated multiproof calculation logic to queue tasks to the account proof manager. - Improved error handling for account manager task failures. - Refactored test setup to create separate storage and account proof managers.
- wait just in time - Replaced standard mpsc channels with crossbeam channels for improved performance and flexibility in proof task management. - Updated `execute_account_multiproof_worker` to handle storage proof requests on-demand, enhancing efficiency during trie traversal. - Introduced standardized error handling for closed storage managers. - Refactored related structures and functions to accommodate the new channel types and improve overall code clarity.
- Updated `build_account_multiproof_with_storage` to accept storage proof receivers, allowing for lazy fetching of storage proofs during trie traversal. - Removed pre-computed storage proof handling in favor of real-time proof retrieval, enhancing performance and efficiency. - Introduced standardized error handling for closed storage proof channels. - Adjusted related functions and structures to support the new fetching mechanism.
- Added metrics for tracking the depth of storage and account queues. - Implemented recording of wait times for storage and account jobs in the queue. - Updated `ProofTaskMetrics` and `ProofTaskManager` to support new metrics functionality. - Enhanced job structures to include timestamps for enqueued jobs, facilitating wait time calculations.
- Enhanced the storage proof fetching mechanism to include non-blocking receive attempts, improving efficiency during trie traversal. - Implemented error handling for various receive scenarios, ensuring robustness in proof retrieval. - Added tests for streaming fetcher with mixed storage proofs and ordering independence, validating the correctness of the parallel proof task execution.
…nd ordering independence - Implemented two new tests to validate the behavior of parallel proof handling with mixed storage targets and varying storage sizes. - Ensured that the parallel proof results are consistent regardless of the order of completion, enhancing the robustness of the proof task execution. - Updated the test setup to create diverse states for comprehensive coverage of edge cases in proof computation.
…elTrieTracker - Introduced metrics for tracking immediate and blocked storage proofs in `ParallelTrieStats`. - Added methods to increment and retrieve these metrics, enhancing performance monitoring. - Updated `ParallelTrieTracker` to support the new storage proof metrics, ensuring comprehensive statistics collection during trie operations.
…proof managers return type - Added error handling for spawn failures in the state root benchmark to ensure robustness. - Updated the return type of `create_proof_managers` to a type alias for improved clarity and maintainability.
- Updated `ProofTaskManager` to allow separate configuration for storage and account workers, enhancing flexibility in task execution. - Added tests to validate transaction reuse across multiple proofs and ensure robust handling of concurrent storage proofs without deadlocks. - Implemented checks for expected worker counts and transaction management, improving performance monitoring and reliability in proof tasks.
- Add type alias for complex proof managers return type - Fix needless borrow in proof manager creation - Add backticks to documentation - Fix explicit_iter_loop warnings in tests - Fix spawn result handling in benchmarks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
… tasks - Updated `execute_account_multiproof_worker` to return errors directly instead of using a result sender, improving error propagation. - Wrapped proof computation in panic recovery to prevent worker failures from causing zombie states. - Clamped worker counts to a minimum of 1 to avoid deadlocks, ensuring robust task execution. - Removed unused metrics from `ParallelTrieTracker`, streamlining the structure and focusing on essential statistics. - Adjusted tests to reflect changes in worker count configurations, enhancing test reliability.
- Updated `execute_account_multiproof_worker` to directly return results, improving error handling and simplifying the function signature. - Wrapped account multiproof execution in panic recovery to prevent worker failures from causing disruptions. - Enhanced storage proof fetching with non-blocking receive attempts, improving efficiency and responsiveness. - Introduced a new method `decoded_multiproof_with_stats` to return both multiproof results and performance statistics, aiding in performance monitoring. - Added tests to validate the new metrics tracking for storage proofs, ensuring comprehensive coverage and reliability in proof task execution.
- Reorganized the logic for queuing storage proof requests during trie traversal to ensure all accounts in the extended prefix set are processed, even those with no storage changes. - Removed redundant code for storage proof request queuing, enhancing clarity and maintainability. - Updated the handling of destroyed accounts to avoid unnecessary cloning, improving performance and memory efficiency.
- No storage proof receiver found
- accounts NOT in targets are being encountered
- config.frozen_prefix_sets is frozen ONCE at config creation time (beginning of block execution), but it should be updated with each transaction's state changes
- The config should NOT cache frozen_prefix_sets
- if config.prefix_sets stays at 0
… NOT in frozen_prefix_sets.account_prefix_set - the refactor only queued storage proofs for the account prefix set. When the trie walker hit an address that existed only in the storage prefix set, there was no queued receiver and it fell back to a slow synchronous proof. Queuing storage proofs for the union of account and storage prefix sets (like main) fixes the regression.
- The refactor removed the cache usage from commit 8effbf2, causing every sibling leaf encountered during account trie walking to trigger a full storage trie walk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
pub struct ProofTaskManager<Factory: DatabaseProviderFactory> { | ||
/// Max number of database transactions to create | ||
max_concurrency: usize, | ||
/// Number of database transactions created | ||
total_transactions: usize, | ||
/// Consistent view provider used for creating transactions on-demand | ||
view: ConsistentDbView<Factory>, | ||
/// Proof task context shared across all proof tasks | ||
task_ctx: ProofTaskCtx, | ||
/// Proof tasks pending execution | ||
pending_tasks: VecDeque<ProofTaskKind>, | ||
/// The underlying handle from which to spawn proof tasks | ||
executor: Handle, | ||
/// The proof task transactions, containing owned cursor factories that are reused for proof | ||
/// calculation. | ||
proof_task_txs: Vec<ProofTaskTx<FactoryTx<Factory>>>, | ||
/// A receiver for new proof tasks. | ||
proof_task_rx: Receiver<ProofTaskMessage<FactoryTx<Factory>>>, | ||
/// A sender for sending back transactions. | ||
tx_sender: Sender<ProofTaskMessage<FactoryTx<Factory>>>, | ||
/// The number of active handles. | ||
/// | ||
/// Incremented in [`ProofTaskManagerHandle::new`] and decremented in | ||
/// [`ProofTaskManagerHandle::drop`]. | ||
active_handles: Arc<AtomicUsize>, | ||
/// Metrics tracking blinded node fetches. | ||
#[cfg(feature = "metrics")] | ||
metrics: ProofTaskMetrics, | ||
/// Error when storage manager is closed. | ||
#[inline] | ||
fn storage_manager_closed_error() -> ParallelStateRootError { | ||
ParallelStateRootError::Other("storage manager closed".into()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing this uncessary abstraction completely
Note: this is a POC and will be broken down into multiple sub prs with this as the main reference, with the reason being this is too breaking and consists of many complex changes. |
TrieInput::from_state(hashed_state), | ||
&TreeConfig::default(), | ||
) | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's expect
here
trie: self.trie.finish(), | ||
precomputed_storage_roots: self.precomputed_storage_roots, | ||
missed_leaves: self.missed_leaves, | ||
// TODO: Remove this after testing. This is to understand the efficiency of the worker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs removing?
debug!( | ||
target: "engine::root", | ||
"MultiProofConfig::new_from_input: INITIAL prefix_sets account_len={} storage_len={}", | ||
prefix_sets.account_prefix_set.len(), | ||
prefix_sets.storage_prefix_sets.len(), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug!( | |
target: "engine::root", | |
"MultiProofConfig::new_from_input: INITIAL prefix_sets account_len={} storage_len={}", | |
prefix_sets.account_prefix_set.len(), | |
prefix_sets.storage_prefix_sets.len(), | |
); | |
debug!( | |
target: "engine::root", | |
account_prefix_sets_len = ?prefix_sets.account_prefix_set.len(), | |
storage_prefix_sets_len = ?prefix_sets.storage_prefix_sets.len(), | |
"MultiProofConfig::new_from_input", | |
); |
if sequence >= self.next_to_deliver { | ||
self.pending_proofs.insert(sequence, update); | ||
} else { | ||
debug!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might be better as a trace
if !self.pending_proofs.is_empty() { | ||
let pending_sequences: Vec<u64> = | ||
self.pending_proofs.keys().take(10).copied().collect(); | ||
debug!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be some leftover debugging?
let old_next = self.next_to_deliver; | ||
self.next_to_deliver += consecutive_proofs.len() as u64; | ||
|
||
debug!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More leftover debugging
consistent_view: ConsistentDbView<Factory>, | ||
mut input: TrieInput, | ||
) -> (TrieInput, Self) { | ||
let prefix_sets = Arc::new(std::mem::take(&mut input.prefix_sets)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could freeze the prefix set at this point, rather than cloning/freezing for every multiproof?
account_proof_task_handle: ProofTaskManagerHandle<FactoryTx<Factory>>, | ||
storage_proof_task_handle: ProofTaskManagerHandle<FactoryTx<Factory>>, | ||
max_concurrent: usize, | ||
missed_leaves_storage_roots: Arc<DashMap<B256, B256>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this get used outside of the manager?
} | ||
|
||
// Timeout detection: if no progress for 10 seconds, dump diagnostic info | ||
if updates_finished && last_progress_time.elapsed().as_secs() > 10 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More debugging to be removed?
storage_targets, | ||
?source, | ||
"Starting multiproof calculation", | ||
self.executor.spawn_blocking(move || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after this round of changes is done we should take another pass and see if we can factor out this spawn_blocking.
POC done |
core changes:
impact:
closes: