perf: bypass precompile cache (0% hit rate in all workloads) by vbuilder69420 · Pull Request #904 · flashbots/rbuilder

vbuilder69420 · 2026-03-22T02:38:15Z

Summary

Bypass the PrecompileCache entirely — profiling shows 0 hits across all tested workloads.

Root Cause

The cache key is (SpecId, Bytes, u64) where the u64 is inputs.gas_limit — the remaining gas at the time of the precompile call. Since remaining gas varies per call even for identical precompile inputs, the cache never hits.

Evidence

Tested with two different contender workloads:

Simple transfers (100 TPS):

simulation_precompile_cache_hits 0
simulation_precompile_cache_misses 177614

Complex DeFi (AMM swaps, lending, oracle updates, liquidations — 20 TPS):

simulation_precompile_cache_hits 0
simulation_precompile_cache_misses 25573

0% hit rate in both cases. Each miss incurs: mutex lock + Bytes clone + HashMap lookup + second mutex lock + LruCache insert.

Benchmark (simple transfers, 100 TPS)

Metric	Before	After	Change
Block fill time (p50)	57.2ms	53.3ms	-6.8%
Block fill time (p95)	94.9ms	80.7ms	-15.0%
E2E latency (p95)	101ms	84ms	-16.8%

Suggested fix (alternative to this PR)

Remove gas_limit from the cache key — precompile results don't depend on the gas limit passed to them. This would make the cache actually work. This PR takes the simpler approach of bypassing it.

Test plan

Verified 0% hit rate with simple transfers
Verified 0% hit rate with complex DeFi workload
Blocks built and submitted correctly
Integration tests pass

🤖 Generated with Claude Code

…klist check Remove the AccessListInspector entirely from RBuilderEVMInspector. Replace the per-opcode blocklist tracking with a post-execution check against ResultAndState.state (EvmState = HashMap<Address, Account>), which already contains every address touched during EVM execution. The AccessListInspector called step() on every EVM opcode to build an access list, solely used to check addresses against the blocklist. Profiling showed this inspector overhead consumed ~52% of CPU time. The EVM execution result already contains the same information in its state diff, making the inspector entirely redundant. Changes: - order_commit.rs: Use create_evm() (NoOpInspector) when no used_state_tracer is needed. Check blocklist via res.state.keys() after execution instead of via access list. - evm_inspector.rs: Remove AccessListInspector from RBuilderEVMInspector. The inspector now only wraps the optional UsedStateEVMInspector (used by parallel builder / EVM caching). This optimization works regardless of whether a blocklist is configured. Benchmark (builder-lab, 100 TPS, seed=42, 60s profiling window): | Metric | Before | After | Change | |---------------------|----------|----------|--------| | Block fill p50 | 96.8ms | 58.9ms | -39% | | Block fill p95 | 129.2ms | 87.1ms | -33% | | E2E latency p50 | 98ms | 61ms | -38% | | E2E latency p95 | 134ms | 92ms | -31% | | Blocks submitted | 255 | 342 | +34% | | Txs included | 17,882 | 23,449 | +31% | Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Skip the precompile cache lookup and insertion entirely. Profiling shows the cache has 0 hits across 177,614 calls — the cache key includes gas_limit which varies per call even for identical precompile inputs, preventing any cache hits. Each cache miss incurs: - Mutex lock/unlock (parking_lot::Mutex) - Bytes clone of the precompile input for the cache key - HashMap lookup - A second mutex lock + LruCache insert on miss This overhead is pure waste with 0% hit rate. Benchmark (builder-lab, 100 TPS, 60s profiling, stacked on AccessListInspector removal): | Metric | Before | After | Change | |---------------------|----------|----------|--------| | Block fill p50 | 57.2ms | 53.3ms | -6.8% | | Block fill p95 | 94.9ms | 80.7ms | -15.0% | | E2E latency p50 | 59ms | 55ms | -6.8% | | E2E latency p95 | 101ms | 84ms | -16.8% | Note: the precompile cache could be made effective by removing gas_limit from the cache key (precompile results don't depend on the gas limit passed to them — they either succeed within their gas budget or fail). This PR takes the simpler approach of bypassing it entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vbuilder69420 · 2026-03-22T02:48:01Z

Updated benchmark: 500 TPS saturated blocks

Ran at 500 TPS to fill blocks to ~73% gas capacity (44M/60M avg). This surfaces the full cost of the cache overhead:

simulation_precompile_cache_hits 0
simulation_precompile_cache_misses 1,161,242

1.16 million mutex lock/unlock + Bytes clone cycles with zero benefit.

Metric	Value
Blocks built	1,260
Blocks submitted	1,219
Block fill p50	48.6ms
Block fill p95	63.2ms
Gas/block avg	44M (73% full)
Txs included	448,198
Max txs/block	501
Precompile cache hits	0
Precompile cache misses	1,161,242

The gas_limit in the cache key is the fundamental issue — it's the remaining gas at call time, which varies per call even for identical precompile inputs.

vbuilder69420 · 2026-03-22T02:53:44Z

High-load benchmark: fill-block + 1000 TPS transfers

Dual contender setup: fill-block (gas-heavy txs) + 1000 TPS simple transfers, running simultaneously:

Metric	Value
Blocks built	1,902
Blocks submitted	1,852
Block fill p50/p95	39.5ms / 62.1ms
Gas/block avg	38M
Blocks >50M gas	850 (46%)
Txs included	1,205,295
Max txs/block	1,001
Precompile cache hits	0
Precompile cache misses	2,397,148

Nearly half the blocks are >50M gas (83% full). 2.4M wasted mutex lock/unlock + Bytes clone cycles.

vbuilder69420 and others added 2 commits March 21, 2026 22:55

vbuilder69420 requested review from ZanCorDX and dvush as code owners March 22, 2026 02:38

vbuilder69420 mentioned this pull request Mar 22, 2026

perf: fix precompile cache key (remove gas_limit) #905

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: bypass precompile cache (0% hit rate in all workloads)#904

perf: bypass precompile cache (0% hit rate in all workloads)#904
vbuilder69420 wants to merge 2 commits intoflashbots:developfrom
vbuilder69420:perf/bypass-precompile-cache

vbuilder69420 commented Mar 22, 2026

Uh oh!

vbuilder69420 commented Mar 22, 2026

Uh oh!

vbuilder69420 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vbuilder69420 commented Mar 22, 2026

Summary

Root Cause

Evidence

Benchmark (simple transfers, 100 TPS)

Suggested fix (alternative to this PR)

Test plan

Uh oh!

vbuilder69420 commented Mar 22, 2026

Updated benchmark: 500 TPS saturated blocks

Uh oh!

vbuilder69420 commented Mar 22, 2026

High-load benchmark: fill-block + 1000 TPS transfers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant