Skip to content

Conversation

yongkangc
Copy link
Member

@yongkangc yongkangc commented Oct 2, 2025

@github-project-automation github-project-automation bot moved this to Backlog in Reth Tracker Oct 2, 2025
@yongkangc yongkangc force-pushed the yk/refactor_storage_multiproof branch from d3e816b to ed56919 Compare October 2, 2025 04:59
@yongkangc yongkangc self-assigned this Oct 2, 2025
@yongkangc yongkangc changed the title wip: proofmanager optimisation perf(trie): proofmanager optimisation WIP Oct 2, 2025
@yongkangc yongkangc requested a review from Copilot October 2, 2025 10:23
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements worker pooling optimization for the ProofTaskManager to reduce overhead from task creation and simplify channel design. The optimization introduces separate worker pools for storage and account proofs, using long-lived database transactions instead of creating new ones for each request.

  • Adds storage and account worker pools with persistent database transactions
  • Refactors channel architecture from mpsc to crossbeam channels for better performance
  • Introduces new metrics for tracking worker pool efficiency and queue depths

Reviewed Changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
crates/trie/parallel/src/stats.rs Adds new metrics for tracking storage proof immediate vs blocked operations
crates/trie/parallel/src/proof_task_metrics.rs Extends metrics with queue depth gauges and wait time histograms
crates/trie/parallel/src/proof_task.rs Core refactor implementing worker pools, new job types, and crossbeam channels
crates/trie/parallel/src/proof.rs Updates proof building to use new channel types and on-demand storage fetching
crates/trie/parallel/Cargo.toml Adds crossbeam-channel dependency
crates/engine/tree/src/tree/payload_validator.rs Propagates error from ProofTaskManager::new
crates/engine/tree/src/tree/payload_processor/multiproof.rs Updates to use dual manager architecture
crates/engine/tree/src/tree/payload_processor/mod.rs Creates separate storage and account proof managers
crates/engine/tree/benches/state_root_task.rs Updates benchmark to handle new error handling
crates/engine/primitives/src/config.rs Adds configuration for storage and account worker counts

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@yongkangc
Copy link
Member Author

yongkangc commented Oct 3, 2025

do we even need this type and queuing in general if we could wire
multiproof — StorageProof/AccProof --> Workers
without going through the additional manager

if we use multi receiver (1 per worker) then there’s no need to do manual queueing unless we can optimize chunking or some other preprocessing
so to me the ProofTaskManager intermediary seems redundant now
if we swap out

/// Sender to the account proof task manager.
account_proof_task_handle: ProofTaskManagerHandle<FactoryTx<Factory>>,
/// Sender to the storage proof task manager.
storage_proof_task_handle: ProofTaskManagerHandle<FactoryTx<Factory>>,
in pub struct MultiproofManager<Factory: DatabaseProviderFactory> {

to

    /// Channel for dispatching storage proof work to workers
    storage_work_tx: crossbeam_channel::Sender<ProofJob>,
    /// Channel for dispatching account multiproof work to workers
    account_work_tx: crossbeam_channel::Sender<AccountProofJob<FactoryTx<Factory>>>,
from pub struct ProofTaskManager<Factory: DatabaseProviderFactory> {
  • spawn multiproof manager + wire it with the proof workers
  • multi proof manager then dispatches requests (storage,account) directly to the workers
    no prooftaskmanager intermediary

@yongkangc yongkangc force-pushed the yk/refactor_storage_multiproof branch 2 times, most recently from 927ad6a to 9b121c0 Compare October 4, 2025 13:19
yongkangc and others added 19 commits October 4, 2025 13:21
- add storage proof workers configuration and integrate into proof task manager
- Introduced `AccountMultiproofInput` structure to handle account multiproof tasks.
- Added `execute_account_multiproof_worker` function to manage multiproof execution.
- Updated `ProofTaskManager` to support account multiproof workers and task dispatching.
- Refactored proof building logic to utilize pre-computed storage proofs for account multiproof generation.
- Added `account_proof_task_handle` to `MultiproofManager` for managing account multiproof tasks.
- Updated multiproof calculation logic to queue tasks to the account proof manager.
- Improved error handling for account manager task failures.
- Refactored test setup to create separate storage and account proof managers.
- wait just in time
- Replaced standard mpsc channels with crossbeam channels for improved performance and flexibility in proof task management.
- Updated `execute_account_multiproof_worker` to handle storage proof requests on-demand, enhancing efficiency during trie traversal.
- Introduced standardized error handling for closed storage managers.
- Refactored related structures and functions to accommodate the new channel types and improve overall code clarity.
- Updated `build_account_multiproof_with_storage` to accept storage proof receivers, allowing for lazy fetching of storage proofs during trie traversal.
- Removed pre-computed storage proof handling in favor of real-time proof retrieval, enhancing performance and efficiency.
- Introduced standardized error handling for closed storage proof channels.
- Adjusted related functions and structures to support the new fetching mechanism.
- Added metrics for tracking the depth of storage and account queues.
- Implemented recording of wait times for storage and account jobs in the queue.
- Updated `ProofTaskMetrics` and `ProofTaskManager` to support new metrics functionality.
- Enhanced job structures to include timestamps for enqueued jobs, facilitating wait time calculations.
- Enhanced the storage proof fetching mechanism to include non-blocking receive attempts, improving efficiency during trie traversal.
- Implemented error handling for various receive scenarios, ensuring robustness in proof retrieval.
- Added tests for streaming fetcher with mixed storage proofs and ordering independence, validating the correctness of the parallel proof task execution.
…nd ordering independence

- Implemented two new tests to validate the behavior of parallel proof handling with mixed storage targets and varying storage sizes.
- Ensured that the parallel proof results are consistent regardless of the order of completion, enhancing the robustness of the proof task execution.
- Updated the test setup to create diverse states for comprehensive coverage of edge cases in proof computation.
…elTrieTracker

- Introduced metrics for tracking immediate and blocked storage proofs in `ParallelTrieStats`.
- Added methods to increment and retrieve these metrics, enhancing performance monitoring.
- Updated `ParallelTrieTracker` to support the new storage proof metrics, ensuring comprehensive statistics collection during trie operations.
…proof managers return type

- Added error handling for spawn failures in the state root benchmark to ensure robustness.
- Updated the return type of `create_proof_managers` to a type alias for improved clarity and maintainability.
- Updated `ProofTaskManager` to allow separate configuration for storage and account workers, enhancing flexibility in task execution.
- Added tests to validate transaction reuse across multiple proofs and ensure robust handling of concurrent storage proofs without deadlocks.
- Implemented checks for expected worker counts and transaction management, improving performance monitoring and reliability in proof tasks.
- Add type alias for complex proof managers return type
- Fix needless borrow in proof manager creation
- Add backticks to documentation
- Fix explicit_iter_loop warnings in tests
- Fix spawn result handling in benchmarks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
… tasks

- Updated `execute_account_multiproof_worker` to return errors directly instead of using a result sender, improving error propagation.
- Wrapped proof computation in panic recovery to prevent worker failures from causing zombie states.
- Clamped worker counts to a minimum of 1 to avoid deadlocks, ensuring robust task execution.
- Removed unused metrics from `ParallelTrieTracker`, streamlining the structure and focusing on essential statistics.
- Adjusted tests to reflect changes in worker count configurations, enhancing test reliability.
- Updated `execute_account_multiproof_worker` to directly return results, improving error handling and simplifying the function signature.
- Wrapped account multiproof execution in panic recovery to prevent worker failures from causing disruptions.
- Enhanced storage proof fetching with non-blocking receive attempts, improving efficiency and responsiveness.
- Introduced a new method `decoded_multiproof_with_stats` to return both multiproof results and performance statistics, aiding in performance monitoring.
- Added tests to validate the new metrics tracking for storage proofs, ensuring comprehensive coverage and reliability in proof task execution.
- Reorganized the logic for queuing storage proof requests during trie traversal to ensure all accounts in the extended prefix set are processed, even those with no storage changes.
- Removed redundant code for storage proof request queuing, enhancing clarity and maintainability.
- Updated the handling of destroyed accounts to avoid unnecessary cloning, improving performance and memory efficiency.
- No storage proof receiver found
- accounts NOT in targets are being encountered
- config.frozen_prefix_sets is frozen ONCE at config creation time (beginning of block execution), but it should be updated with each transaction's state changes
- The config should NOT cache frozen_prefix_sets
- if config.prefix_sets stays at 0
… NOT in

  frozen_prefix_sets.account_prefix_set

- the refactor only queued storage proofs for the account prefix set. When the trie
  walker hit an address that existed only in the storage prefix set, there was no queued
  receiver and it fell back to a slow synchronous proof. Queuing storage proofs for the union
  of account and storage prefix sets (like main) fixes the regression.
- The refactor removed the cache usage from commit 8effbf2, causing every sibling leaf encountered during account trie walking to trigger a full storage trie walk.
@yongkangc yongkangc requested a review from Copilot October 6, 2025 09:31
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines -54 to 82
pub struct ProofTaskManager<Factory: DatabaseProviderFactory> {
/// Max number of database transactions to create
max_concurrency: usize,
/// Number of database transactions created
total_transactions: usize,
/// Consistent view provider used for creating transactions on-demand
view: ConsistentDbView<Factory>,
/// Proof task context shared across all proof tasks
task_ctx: ProofTaskCtx,
/// Proof tasks pending execution
pending_tasks: VecDeque<ProofTaskKind>,
/// The underlying handle from which to spawn proof tasks
executor: Handle,
/// The proof task transactions, containing owned cursor factories that are reused for proof
/// calculation.
proof_task_txs: Vec<ProofTaskTx<FactoryTx<Factory>>>,
/// A receiver for new proof tasks.
proof_task_rx: Receiver<ProofTaskMessage<FactoryTx<Factory>>>,
/// A sender for sending back transactions.
tx_sender: Sender<ProofTaskMessage<FactoryTx<Factory>>>,
/// The number of active handles.
///
/// Incremented in [`ProofTaskManagerHandle::new`] and decremented in
/// [`ProofTaskManagerHandle::drop`].
active_handles: Arc<AtomicUsize>,
/// Metrics tracking blinded node fetches.
#[cfg(feature = "metrics")]
metrics: ProofTaskMetrics,
/// Error when storage manager is closed.
#[inline]
fn storage_manager_closed_error() -> ParallelStateRootError {
ParallelStateRootError::Other("storage manager closed".into())
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this uncessary abstraction completely

@yongkangc
Copy link
Member Author

Note: this is a POC and will be broken down into multiple sub prs with this as the main reference, with the reason being this is too breaking and consists of many complex changes.

TrieInput::from_state(hashed_state),
&TreeConfig::default(),
)
.unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's expect here

trie: self.trie.finish(),
precomputed_storage_roots: self.precomputed_storage_roots,
missed_leaves: self.missed_leaves,
// TODO: Remove this after testing. This is to understand the efficiency of the worker
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs removing?

Comment on lines +95 to +100
debug!(
target: "engine::root",
"MultiProofConfig::new_from_input: INITIAL prefix_sets account_len={} storage_len={}",
prefix_sets.account_prefix_set.len(),
prefix_sets.storage_prefix_sets.len(),
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
debug!(
target: "engine::root",
"MultiProofConfig::new_from_input: INITIAL prefix_sets account_len={} storage_len={}",
prefix_sets.account_prefix_set.len(),
prefix_sets.storage_prefix_sets.len(),
);
debug!(
target: "engine::root",
account_prefix_sets_len = ?prefix_sets.account_prefix_set.len(),
storage_prefix_sets_len = ?prefix_sets.storage_prefix_sets.len(),
"MultiProofConfig::new_from_input",
);

if sequence >= self.next_to_deliver {
self.pending_proofs.insert(sequence, update);
} else {
debug!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be better as a trace

if !self.pending_proofs.is_empty() {
let pending_sequences: Vec<u64> =
self.pending_proofs.keys().take(10).copied().collect();
debug!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be some leftover debugging?

let old_next = self.next_to_deliver;
self.next_to_deliver += consecutive_proofs.len() as u64;

debug!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More leftover debugging

consistent_view: ConsistentDbView<Factory>,
mut input: TrieInput,
) -> (TrieInput, Self) {
let prefix_sets = Arc::new(std::mem::take(&mut input.prefix_sets));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could freeze the prefix set at this point, rather than cloning/freezing for every multiproof?

account_proof_task_handle: ProofTaskManagerHandle<FactoryTx<Factory>>,
storage_proof_task_handle: ProofTaskManagerHandle<FactoryTx<Factory>>,
max_concurrent: usize,
missed_leaves_storage_roots: Arc<DashMap<B256, B256>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this get used outside of the manager?

}

// Timeout detection: if no progress for 10 seconds, dump diagnostic info
if updates_finished && last_progress_time.elapsed().as_secs() > 10 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More debugging to be removed?

storage_targets,
?source,
"Starting multiproof calculation",
self.executor.spawn_blocking(move || {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think after this round of changes is done we should take another pass and see if we can factor out this spawn_blocking.

@yongkangc
Copy link
Member Author

POC done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants