perf: bypass precompile cache (0% hit rate in all workloads)#904
Open
vbuilder69420 wants to merge 2 commits intoflashbots:developfrom
Open
perf: bypass precompile cache (0% hit rate in all workloads)#904vbuilder69420 wants to merge 2 commits intoflashbots:developfrom
vbuilder69420 wants to merge 2 commits intoflashbots:developfrom
Conversation
…klist check Remove the AccessListInspector entirely from RBuilderEVMInspector. Replace the per-opcode blocklist tracking with a post-execution check against ResultAndState.state (EvmState = HashMap<Address, Account>), which already contains every address touched during EVM execution. The AccessListInspector called step() on every EVM opcode to build an access list, solely used to check addresses against the blocklist. Profiling showed this inspector overhead consumed ~52% of CPU time. The EVM execution result already contains the same information in its state diff, making the inspector entirely redundant. Changes: - order_commit.rs: Use create_evm() (NoOpInspector) when no used_state_tracer is needed. Check blocklist via res.state.keys() after execution instead of via access list. - evm_inspector.rs: Remove AccessListInspector from RBuilderEVMInspector. The inspector now only wraps the optional UsedStateEVMInspector (used by parallel builder / EVM caching). This optimization works regardless of whether a blocklist is configured. Benchmark (builder-lab, 100 TPS, seed=42, 60s profiling window): | Metric | Before | After | Change | |---------------------|----------|----------|--------| | Block fill p50 | 96.8ms | 58.9ms | -39% | | Block fill p95 | 129.2ms | 87.1ms | -33% | | E2E latency p50 | 98ms | 61ms | -38% | | E2E latency p95 | 134ms | 92ms | -31% | | Blocks submitted | 255 | 342 | +34% | | Txs included | 17,882 | 23,449 | +31% | Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip the precompile cache lookup and insertion entirely. Profiling shows the cache has 0 hits across 177,614 calls — the cache key includes gas_limit which varies per call even for identical precompile inputs, preventing any cache hits. Each cache miss incurs: - Mutex lock/unlock (parking_lot::Mutex) - Bytes clone of the precompile input for the cache key - HashMap lookup - A second mutex lock + LruCache insert on miss This overhead is pure waste with 0% hit rate. Benchmark (builder-lab, 100 TPS, 60s profiling, stacked on AccessListInspector removal): | Metric | Before | After | Change | |---------------------|----------|----------|--------| | Block fill p50 | 57.2ms | 53.3ms | -6.8% | | Block fill p95 | 94.9ms | 80.7ms | -15.0% | | E2E latency p50 | 59ms | 55ms | -6.8% | | E2E latency p95 | 101ms | 84ms | -16.8% | Note: the precompile cache could be made effective by removing gas_limit from the cache key (precompile results don't depend on the gas limit passed to them — they either succeed within their gas budget or fail). This PR takes the simpler approach of bypassing it entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
Updated benchmark: 500 TPS saturated blocksRan at 500 TPS to fill blocks to ~73% gas capacity (44M/60M avg). This surfaces the full cost of the cache overhead: 1.16 million mutex lock/unlock + Bytes clone cycles with zero benefit.
The |
Author
High-load benchmark: fill-block + 1000 TPS transfersDual contender setup:
Nearly half the blocks are >50M gas (83% full). 2.4M wasted mutex lock/unlock + Bytes clone cycles. |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bypass the
PrecompileCacheentirely — profiling shows 0 hits across all tested workloads.Root Cause
The cache key is
(SpecId, Bytes, u64)where theu64isinputs.gas_limit— the remaining gas at the time of the precompile call. Since remaining gas varies per call even for identical precompile inputs, the cache never hits.Evidence
Tested with two different contender workloads:
Simple transfers (100 TPS):
Complex DeFi (AMM swaps, lending, oracle updates, liquidations — 20 TPS):
0% hit rate in both cases. Each miss incurs: mutex lock +
Bytesclone + HashMap lookup + second mutex lock + LruCache insert.Benchmark (simple transfers, 100 TPS)
Suggested fix (alternative to this PR)
Remove
gas_limitfrom the cache key — precompile results don't depend on the gas limit passed to them. This would make the cache actually work. This PR takes the simpler approach of bypassing it.Test plan
🤖 Generated with Claude Code