-
Notifications
You must be signed in to change notification settings - Fork 180
feat(fill, execute): track execution & setup testing phase #2157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(fill, execute): track execution & setup testing phase #2157
Conversation
ef9acbe
to
ec37e3a
Compare
Note: I have not yet tested against |
ec37e3a
to
9f85ffc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @LouisTsai-Csie. I don't have much to add on the implementation. This looks great. I'm going to test this early next week but just had some minor observations at first pass and left as comments.
I wanted to just get a bit of clarity on the scope on how the metadata of each phase will be used. I think we're capturing the data in a good way and I will continue my review next week, but I'd like to understand all the ways we currently plan to use this kind of metadata to get a better understanding for the design.
I think we mentioned being able to call execute
on a network with a setup
phase first, and later call it with the execution
phase. Is there any other case I'm missing so I can understand the end goal on capturing this split? Thanks!
cc: @marioevz
Yes, we mentioned in bloatnet / stateful testing scenario, we want to only run the setup phase only, which will configure all the state to prepare for the later benchmarking purpose, as discussed in issue #2112. I update the metadata also in the PR based on the following reason, you could see a shorter version in PR description, please let me know if this make sense or not: The current architecture split the benchmark cases into several distinct phases:
However, there is an issue with the current design. The problem lies within the Testing Phase, currently, all transactions and blocks passed from the for block in self.blocks: # Benchmark target currently includes every block passed from BlockchainTest/StateTest
signed_txs = []
for tx_index, tx in enumerate(block):
# Add metadata
tx = tx.with_signature_and_sender()
to_address = tx.to
label = to_address.label if isinstance(to_address, Address) else None
tx.metadata = TransactionTestMetadata(
test_id=request.node.nodeid,
phase="testing", # Marking the benchmark phase
target=label,
tx_index=tx_index,
)
signed_txs.append(tx)
... This approach becomes problematic because even within a benchmark test, some blocks are still part of the setup process. def test_worst_blockhash(
blockchain_test: BlockchainTestFiller,
pre: Alloc,
gas_benchmark_value: int,
):
"""
Test running a block with as many blockhash accesses to the oldest allowed block as possible.
"""
blocks = [Block()] * 256
execution_code = Op.PUSH1(1) + While(
body=Op.POP(Op.BLOCKHASH(Op.DUP1)),
)
execution_code_address = pre.deploy_contract(code=execution_code)
op_tx = Transaction(
to=execution_code_address,
gas_limit=gas_benchmark_value,
sender=pre.fund_eoa(),
)
blocks.append(Block(txs=[op_tx]))
blockchain_test(
pre=pre,
post={},
blocks=blocks,
) In this test, 257 blocks are passed to execute, but only the last block is the actual benchmarking target. Therefore, we need to further subdivide the Testing Phase into smaller units:
Currently, we donβt separate different phases. Instead, we simply inform the Nethermind team and others to βmeasure only the last block for benchmarking.β This requires the test implementer and these teams to be aware of that restriction. In With the introduction of the phase manager, engineers will be able to explicitly define distinct phases within the benchmark process, improving clarity and control over what is actually measured. In the current design, even if two tx with different phase is inside a single block, we could still classify each one into their corresponding phase. |
I am not sure if my understanding and implementation is correct. Please let me know if there is anything is unclear and needs further revision or discussion! |
Apologies for not being more clear. I completely understand the need for splitting these contexts up. It's clear from this PR that the separation of concerns is working well. I wanted a bit more context on how it will be used. Defining and splitting them up is one thing, but then using this in practice is another. I think the issue you linked has some good information for me here:
I was wondering the design here. Would we be able to call something like I think this looks good from my end and is ready to go imo ππΌ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! @marioevz I'll let you merge this in case there is something from the original issue we should cover here but the logic for phase-splitting is sound here imo ππΌ
No worries! i found i understand the initial question wrong. And yes i think this could be the future goal, but might need a dedicate PR for this feature!
Thanks for your review and suggestion! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR needs more iterations because I would lean towards not introduce breaking changes to ethereum_test_specs/blockchain
at this point.
We can also defer this to be post weld, it's not critical to have it before we move.
def test_mixed_operations(num_setup_txs, num_setup_blocks, num_exec_txs, num_exec_blocks): | ||
"""Test mixed operations across phases.""" | ||
manager = TestPhaseManager() | ||
|
||
# Add setup items | ||
with manager.setup(): | ||
for i in range(num_setup_txs): | ||
tx = Transaction(to=Address(0x1000 + i), value=i * 10, gas_limit=21000) | ||
manager.add_transaction(tx) | ||
|
||
for i in range(num_setup_blocks): | ||
tx = Transaction(to=Address(0x2000 + i), value=i * 100, gas_limit=21000) | ||
block = Block(txs=[tx]) | ||
manager.add_block(block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the syntax is not quite there yet, because requiring test_phase_manager
fixture whenever we want to tag a transaction seems unnecessary to me.
Maybe we could improve it by making TestPhaseManager
a singleton that can be accessed from anywhere and provide the context when the transactions are being instantiated.
This is pseudocode:
txs = []
# Add setup items
with TestPhaseManager.setup():
for i in range(num_setup_txs):
tx = Transaction(to=Address(0x1000 + i), value=i * 10, gas_limit=21000)
# tx.test_phase at this point is automatically set to TestPhase.SETUP (`default_factory` for the field could be a lambda function that fetches the current phase from the singleton)
txs.append(tx)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with TestPhaseManager.setup()
This wouldn't be an instance call though, right? So setup
and execution
wouldn't live in the same manager. I like the approach though. I was thinking if it could somehow live in the benchmark_test
itself that would be quite nice and intuitive.
with benchmark_test.setup():
....
Just a thought. Not sure if it would be so straightforward but maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With your other comment that makes sense to me now how this would work. We would only need to add context to the transactions in that case. What happens when we mine a bunch of blocks in a setup phase with no transactions? Just curious. Wouldn't we want to track setup
and execution
at the block level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iiuc, we are now sourcing blocks from two places which can be really confusing.
Besides, if we are targeting to only tag transactions so execute
can decide whether to send them or not depending on the current phase, then the Block
class does not need to be modified, and the only thing we need to add is the test_phase
field in Transaction
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see yeah, this is the kind of scope I was looking for. That makes sense.
Hiya, could you show how this (If this is already somewhere in this PR comments, please link me then I have "read over" that part. Or a link to the changed code is also fine! π π ) |
ποΈ Description
Background for reviewers:
In some benchmark tests, there are not only benchmark transactions but also
setup
transactions that create the scenario for the benchmark target. For example, in theSSTORE
benchmark we first need asetup
transaction that deploys as many contracts as possible, and then a separate transaction to perform the storage updates. This feature helps gas-limit testing distinguish between thesetup
andexecution
phases.This PR extends transaction metadata, which labeling phases such as
setup
,testing
, andcleanup
. Currently, operations likedeploy_contract
orfund_eoa
are tagged assetup
phase, while transactions/blocks included instate_test
orblockchain_test
are tagged as testing. However, the current labeling is not always precise enough.Take
test_worst_blockhash
as an example. The first 256 blocks should be classified assetup
, while only the final block should count as the testing phase. Without this distinction, we might mistakenly include the setup blocks in benchmark accounting.The design introduces a
test_phase
attribute at both the block and transaction level (currently supportingexecution
andsetup
phases). This attribute is used during execution (seetransaction_post.py
for details), and the metadata is updated accordingly.π Related Issues or PRs
This is the follow-up PR for #1945, more description could be found in issue #2137.
Related discussion: 1, 2
β Checklist
tox
checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
type(scope):
.mkdocs serve
locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.@ported_from
marker.