feat(fill, execute): track execution & setup testing phase #2157

LouisTsai-Csie · 2025-09-16T09:04:08Z

🗒️ Description

Background for reviewers:

In some benchmark tests, there are not only benchmark transactions but also setup transactions that create the scenario for the benchmark target. For example, in the SSTORE benchmark we first need a setup transaction that deploys as many contracts as possible, and then a separate transaction to perform the storage updates. This feature helps gas-limit testing distinguish between the setup and execution phases.

This PR extends transaction metadata, which labeling phases such as setup, testing, and cleanup. Currently, operations like deploy_contract or fund_eoa are tagged as setup phase, while transactions/blocks included in state_test or blockchain_test are tagged as testing. However, the current labeling is not always precise enough.

Take test_worst_blockhash as an example. The first 256 blocks should be classified as setup, while only the final block should count as the testing phase. Without this distinction, we might mistakenly include the setup blocks in benchmark accounting.

def test_worst_blockhash(
    blockchain_test: BlockchainTestFiller,
    pre: Alloc,
    gas_benchmark_value: int,
):
    """Test running a block with as many blockhash accesses to the oldest allowed block as possible."""
    # Create 256 dummy blocks to fill the blockhash window.
    blocks = [Block()] * 256

    # Always ask for the oldest allowed BLOCKHASH block.
    execution_code = Op.PUSH1(1) + While(
        body=Op.POP(Op.BLOCKHASH(Op.DUP1)),
    )
    execution_code_address = pre.deploy_contract(code=execution_code)
    op_tx = Transaction(
        to=execution_code_address,
        gas_limit=gas_benchmark_value,
        sender=pre.fund_eoa(),
    )
    blocks.append(Block(txs=[op_tx]))

    blockchain_test(
        pre=pre,
        post={},
        blocks=blocks,
    )

The design introduces a test_phase attribute at both the block and transaction level (currently supporting execution and setup phases). This attribute is used during execution (see transaction_post.py for details), and the metadata is updated accordingly.

🔗 Related Issues or PRs

This is the follow-up PR for #1945, more description could be found in issue #2137.
Related discussion: 1, 2

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

LouisTsai-Csie · 2025-09-17T09:51:18Z

Note: I have not yet tested against execute command. I will provide some result later

fselmo

Hey @LouisTsai-Csie. I don't have much to add on the implementation. This looks great. I'm going to test this early next week but just had some minor observations at first pass and left as comments.

I wanted to just get a bit of clarity on the scope on how the metadata of each phase will be used. I think we're capturing the data in a good way and I will continue my review next week, but I'd like to understand all the ways we currently plan to use this kind of metadata to get a better understanding for the design.

I think we mentioned being able to call execute on a network with a setup phase first, and later call it with the execution phase. Is there any other case I'm missing so I can understand the end goal on capturing this split? Thanks!

cc: @marioevz

src/ethereum_test_types/tests/test_phase_manager.py

src/ethereum_test_specs/blockchain.py

LouisTsai-Csie · 2025-10-06T15:03:07Z

Yes, we mentioned in bloatnet / stateful testing scenario, we want to only run the setup phase only, which will configure all the state to prepare for the later benchmarking purpose, as discussed in issue #2112.

I update the metadata also in the PR based on the following reason, you could see a shorter version in PR description, please let me know if this make sense or not:

The current architecture split the benchmark cases into several distinct phases:

Setup Phase: This phase includes operations such as deploying contracts via deploy_contract, funding accounts through fund_eoa, and initial state related setup tasks. These actions are part of the initial env preparation and should not be considered part of the actual benchmarking target.
Testing Phase: This is the main benchmarking phase, designed to measure the worst-case performance for specific opcodes/precompiles/scenarios. It is what we include in the benchmark tests and typically corresponds to the last block in a benchmark test.
Cleanup Phase: After the execute mode completes, tokens are refunded from the EOA to the sender account.

However, there is an issue with the current design. The problem lies within the Testing Phase, currently, all transactions and blocks passed from the BlockchainTest or StateTest are treated as part of the benchmark target:

for block in self.blocks:  # Benchmark target currently includes every block passed from BlockchainTest/StateTest
    signed_txs = []
    for tx_index, tx in enumerate(block):
        # Add metadata
        tx = tx.with_signature_and_sender()
        to_address = tx.to
        label = to_address.label if isinstance(to_address, Address) else None
        tx.metadata = TransactionTestMetadata(
            test_id=request.node.nodeid,
            phase="testing",  # Marking the benchmark phase
            target=label,
            tx_index=tx_index,
        )
        signed_txs.append(tx)
        ...

This approach becomes problematic because even within a benchmark test, some blocks are still part of the setup process.
Take the test_worst_blockhash example below:

def test_worst_blockhash(
    blockchain_test: BlockchainTestFiller,
    pre: Alloc,
    gas_benchmark_value: int,
):
    """
    Test running a block with as many blockhash accesses to the oldest allowed block as possible.
    """
    blocks = [Block()] * 256

    execution_code = Op.PUSH1(1) + While(
        body=Op.POP(Op.BLOCKHASH(Op.DUP1)),
    )
    execution_code_address = pre.deploy_contract(code=execution_code)
    op_tx = Transaction(
        to=execution_code_address,
        gas_limit=gas_benchmark_value,
        sender=pre.fund_eoa(),
    )
    blocks.append(Block(txs=[op_tx]))

    blockchain_test(
        pre=pre,
        post={},
        blocks=blocks,
    )

In this test, 257 blocks are passed to execute, but only the last block is the actual benchmarking target.

Therefore, we need to further subdivide the Testing Phase into smaller units:

The earlier part of the Testing Phase is still performing setup operations (which should be labeled as Setup).
The actual benchmark measurement should be explicitly labeled as the Execution Phase.

Currently, we don’t separate different phases. Instead, we simply inform the Nethermind team and others to “measure only the last block for benchmarking.” This requires the test implementer and these teams to be aware of that restriction.

In test_worst_bytecode_single_opcode, there are two transactions: contracts_deployment_tx(setup phase) and opcode_tx(execution phase). These should not be included in the same block, as doing so would pollute the benchmark results.

With the introduction of the phase manager, engineers will be able to explicitly define distinct phases within the benchmark process, improving clarity and control over what is actually measured. In the current design, even if two tx with different phase is inside a single block, we could still classify each one into their corresponding phase.

LouisTsai-Csie · 2025-10-06T15:12:16Z

I am not sure if my understanding and implementation is correct. Please let me know if there is anything is unclear and needs further revision or discussion!

fselmo · 2025-10-06T15:14:11Z

With the introduction of the phase manager, engineers will be able to explicitly define distinct phases within the benchmark process, improving clarity and control over what is actually measured. In the current design, even if two tx with different phase is inside a single block, we could still classify each one into their corresponding phase.

Apologies for not being more clear. I completely understand the need for splitting these contexts up. It's clear from this PR that the separation of concerns is working well. I wanted a bit more context on how it will be used. Defining and splitting them up is one thing, but then using this in practice is another.

I think the issue you linked has some good information for me here:

We could use a two-phase approach (similar to how we fill engine-x tests) where the first phase runs all benchmark tests but in setup mode so it only deploys the contracts, and then the second phase actually runs the tests.

The setup phase could even be deterministic so we check all whether the contracts are already there before deploying any of them. This could be achieved by having a setup-seed so we can run the workload phase many times over and over with a single setup and just passing a --setup-seed flag or similar.

I was wondering the design here. Would we be able to call something like uv run execute --phase="setup" + uv run execute --phase="execution" or internally do we always run both or check the testing phase and if it is already set up we go straight to execution, etc? This is definitely out of scope for this PR but I was trying to get some clarity on the execution design so that I can be better informed here.

I think this looks good from my end and is ready to go imo 👍🏼

fselmo

lgtm! @marioevz I'll let you merge this in case there is something from the original issue we should cover here but the logic for phase-splitting is sound here imo 👍🏼

LouisTsai-Csie · 2025-10-06T15:24:51Z

Apologies for not being more clear. I completely understand the need for splitting these contexts up. It's clear from this PR that the separation of concerns is working well. I wanted a bit more context on how it will be used. Defining and splitting them up is one thing, but then using this in practice is another.

No worries! i found i understand the initial question wrong. And yes i think this could be the future goal, but might need a dedicate PR for this feature!

uv run execute --phase=setup + uv run execute --phase=execution

Thanks for your review and suggestion!

marioevz

I think this PR needs more iterations because I would lean towards not introduce breaking changes to ethereum_test_specs/blockchain at this point.

We can also defer this to be post weld, it's not critical to have it before we move.

marioevz · 2025-10-07T21:14:20Z

src/ethereum_test_types/tests/test_phase_manager.py

+def test_mixed_operations(num_setup_txs, num_setup_blocks, num_exec_txs, num_exec_blocks):
+    """Test mixed operations across phases."""
+    manager = TestPhaseManager()
+
+    # Add setup items
+    with manager.setup():
+        for i in range(num_setup_txs):
+            tx = Transaction(to=Address(0x1000 + i), value=i * 10, gas_limit=21000)
+            manager.add_transaction(tx)
+
+        for i in range(num_setup_blocks):
+            tx = Transaction(to=Address(0x2000 + i), value=i * 100, gas_limit=21000)
+            block = Block(txs=[tx])
+            manager.add_block(block)


I feel like the syntax is not quite there yet, because requiring test_phase_manager fixture whenever we want to tag a transaction seems unnecessary to me.

Maybe we could improve it by making TestPhaseManager a singleton that can be accessed from anywhere and provide the context when the transactions are being instantiated.

This is pseudocode:

txs = [] # Add setup items with TestPhaseManager.setup(): for i in range(num_setup_txs): tx = Transaction(to=Address(0x1000 + i), value=i * 10, gas_limit=21000) # tx.test_phase at this point is automatically set to TestPhase.SETUP (`default_factory` for the field could be a lambda function that fetches the current phase from the singleton) txs.append(tx)

with TestPhaseManager.setup()

This wouldn't be an instance call though, right? So setup and execution wouldn't live in the same manager. I like the approach though. I was thinking if it could somehow live in the benchmark_test itself that would be quite nice and intuitive.

with benchmark_test.setup(): ....

Just a thought. Not sure if it would be so straightforward but maybe.

With your other comment that makes sense to me now how this would work. We would only need to add context to the transactions in that case. What happens when we mine a bunch of blocks in a setup phase with no transactions? Just curious. Wouldn't we want to track setup and execution at the block level?

marioevz · 2025-10-07T21:17:15Z

src/ethereum_test_specs/blockchain.py

Iiuc, we are now sourcing blocks from two places which can be really confusing.

Besides, if we are targeting to only tag transactions so execute can decide whether to send them or not depending on the current phase, then the Block class does not need to be modified, and the only thing we need to add is the test_phase field in Transaction.

I see yeah, this is the kind of scope I was looking for. That makes sense.

jochem-brouwer · 2025-10-07T23:58:53Z

Hiya, could you show how this test_worst_blockhash test would be written using this new metadata? I'd assume we have to decorate the blocks with a marker? It would be great to see how this would look when writing the tests 😄 👍

(If this is already somewhere in this PR comments, please link me then I have "read over" that part. Or a link to the changed code is also fine! 😄 👍 )

LouisTsai-Csie self-assigned this Sep 16, 2025

LouisTsai-Csie added scope:fw Scope: Framework (evm|tools|forks|pytest) scope:fill Scope: fill command scope:execute Scope: Changes to the execute command labels Sep 16, 2025

LouisTsai-Csie mentioned this pull request Sep 16, 2025

feat(benchmark): add benchmark_test test type #1945

Merged

5 tasks

LouisTsai-Csie force-pushed the feat/add-phase-manager branch from ef9acbe to ec37e3a Compare September 17, 2025 09:23

danceratopz self-requested a review September 25, 2025 13:41

LouisTsai-Csie added 4 commits September 26, 2025 17:14

feat(tests): add phase manager to track testing phase

42a1e8c

refactor: update test phase manager instance model

2e8ee6a

fix: resolve linting issue

2dab80d

test: add case for TestPhaseManager functionality

9f85ffc

LouisTsai-Csie force-pushed the feat/add-phase-manager branch from ec37e3a to 9f85ffc Compare September 26, 2025 11:20

LouisTsai-Csie mentioned this pull request Sep 29, 2025

eest_tests sends blocks with too much gas NethermindEth/gas-benchmarks#60

Open

fselmo reviewed Oct 3, 2025

View reviewed changes

src/ethereum_test_types/tests/test_phase_manager.py Show resolved Hide resolved

src/ethereum_test_specs/blockchain.py Outdated Show resolved Hide resolved

tests: isolated phase manager behaviour

b7faeef

fselmo approved these changes Oct 6, 2025

View reviewed changes

marioevz requested changes Oct 7, 2025

View reviewed changes

fselmo self-requested a review October 7, 2025 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(fill, execute): track execution & setup testing phase #2157

feat(fill, execute): track execution & setup testing phase #2157

Uh oh!

LouisTsai-Csie commented Sep 16, 2025 •

edited

Loading

Uh oh!

LouisTsai-Csie commented Sep 17, 2025

Uh oh!

fselmo left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie commented Oct 6, 2025 •

edited

Loading

Uh oh!

LouisTsai-Csie commented Oct 6, 2025

Uh oh!

fselmo commented Oct 6, 2025 •

edited

Loading

Uh oh!

fselmo left a comment •

edited

Loading

Uh oh!

LouisTsai-Csie commented Oct 6, 2025

Uh oh!

marioevz left a comment

Uh oh!

marioevz Oct 7, 2025

Uh oh!

fselmo Oct 7, 2025

Uh oh!

fselmo Oct 7, 2025

Uh oh!

marioevz Oct 7, 2025

Uh oh!

fselmo Oct 7, 2025 •

edited

Loading

Uh oh!

jochem-brouwer commented Oct 7, 2025

Uh oh!

Uh oh!

feat(fill, execute): track execution & setup testing phase #2157

Are you sure you want to change the base?

feat(fill, execute): track execution & setup testing phase #2157

Uh oh!

Conversation

LouisTsai-Csie commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Uh oh!

LouisTsai-Csie commented Sep 17, 2025

Uh oh!

fselmo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LouisTsai-Csie commented Oct 6, 2025

Uh oh!

fselmo commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fselmo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie commented Oct 6, 2025

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

marioevz Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

fselmo Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

fselmo Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

marioevz Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

fselmo Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jochem-brouwer commented Oct 7, 2025

Uh oh!

Uh oh!

LouisTsai-Csie commented Sep 16, 2025 •

edited

Loading

fselmo left a comment •

edited

Loading

LouisTsai-Csie commented Oct 6, 2025 •

edited

Loading

fselmo commented Oct 6, 2025 •

edited

Loading

fselmo left a comment •

edited

Loading

fselmo Oct 7, 2025 •

edited

Loading