Make EVM transaction submission non-blocking #654

m-Peter · 2024-11-12T15:07:28Z

Depends on: #586

Currently, whenever a transaction is submitted to the EVM Gateway, by calling the eth_sendRawTransaction JSON-RPC endpoint, under the hood we submit a Flow transaction to the Flow network, which wraps a call to EVM.run.

The eth_sendRawTransaction is a blocking operation, as it awaits to receive the status of the submitted Flow transaction, until it becomes sealed (see: https://github.com/onflow/flow-evm-gateway/blob/main/services/requester/pool.go#L71-L78). This is done with a retriable operation, using a Fibonacci back-off mechanism. And it has been reported to take some time, on average 20-25 seconds to response (as per community devs).

The rationale behind this is the fact that only EVM transactions with a failed or successful status are included in EVM blocks (see: https://github.com/onflow/flow-go/blob/master/fvm/evm/stdlib/contract.cdc#L318-L329).

However, the eth_sendRawTransaction endpoint has to return the EVM tx hash, or an error message, in the case of invalid transactions. Examples of invalid transactions are cases of nonce mismatch, or insufficient balance. The EVM Gateway had no other way of getting back these validation errors, other than waiting for the Flow transaction to seal.

With the implementation of the local state index, we now have access to the nonce and balance and we can perform all validation checks, removing the need to submit transactions which are known to be invalid.
The list of possible validation errors can be found here: https://github.com/onflow/flow-go/blob/master/fvm/evm/types/codeFinder.go#L11-L54
The validation checks performed by Geth, can be found here: https://github.com/onflow/go-ethereum/blob/master/core/state_transition.go#L374-L382.

If we move all the validation checks on the EVM Gateway side, we can remove the blocking nature of eth_sendRawTransaction, thus making it more efficient.

Discord thread: https://discord.com/channels/613813861610684416/1162086721471647874/1305607307157311519

The text was updated successfully, but these errors were encountered:

bluesign · 2024-11-13T15:02:59Z

Fibonacci backoff here was the mistake I guess, Fibanocci backoff assumes uniform availability ( chance of seal in R1 == Rn )

But chance of seal in Flow is actually starting from 0 ( R1=0 ) and somehow increasing over time ( Rn+1 > Rn )

I think for temporary measure starting check from T+d ( where d = educated guess of seal time ( 10 seconds for example ) ) solves the problem.

m-Peter · 2024-11-13T15:31:49Z

Fibonacci backoff here was the mistake I guess, Fibanocci backoff assumes uniform availability ( chance of seal in R1 == Rn )

But chance of seal in Flow is actually starting from 0 ( R1=0 ) and somehow increasing over time ( Rn+1 > Rn )

I think for temporary measure starting check from T+d ( where d = educated guess of seal time ( 10 seconds for example ) ) solves the problem.

Yeah, it seems that the Fibonacci backoff is causing the "large" gaps between retries. It could worth the effort to try out your suggestion, as a temporary measure, before removing the need for awaiting the transaction result altogether. Thanks for the suggestion 🙏

dete · 2024-11-13T18:58:47Z

With the implementation of the local state index, we now have access to the nonce and balance and we can perform all validation checks

Be careful with this! The AN will always be looking at state which is 5-15 seconds behind the state in which the transaction will be executed. For example, I could submit a tx with nonce 5 to one GW, then two seconds later, submit a tx with nonce 6 to another GW. This is highly likely (but not guaranteed!) to succeed if allowed to happen. Less likely (but also possible) a tx that looks like it doesn't have enough balance to run might be waiting for a deposit that is currently in flight.

What's odd is that I don't see how this is any different between Ethereum and Flow. On Ethereum, the node I'm talking to also has a delayed view of the state, and can't be sure there isn't a transaction somewhere in the mempool that might make a valid tx invalid, or vice versa. The latency on Flow and the block time on Ethereum are also similar, so it should feel "pretty normal" to an Ethereum dev. There might be more to dig into here... 🤔

m-Peter · 2024-11-14T12:41:50Z

Be careful with this! The AN will always be looking at state which is 5-15 seconds behind the state in which the transaction will be executed. For example, I could submit a tx with nonce 5 to one GW, then two seconds later, submit a tx with nonce 6 to another GW. This is highly likely (but not guaranteed!) to succeed if allowed to happen. Less likely (but also possible) a tx that looks like it doesn't have enough balance to run might be waiting for a deposit that is currently in flight.

You have a good point here 👌 . I'll also add the case where some tool out there simply by-passes the GW altogether, by using Cadence transactions which wrap EVM.run(tx: hexEncodedTx, ...). Which means that every GW instance, relies entirely on AN, to get the complete EVM state. It can't just rely on the transactions that go directly through the GW instances.

What's odd is that I don't see how this is any different between Ethereum and Flow. On Ethereum, the node I'm talking to also has a delayed view of the state, and can't be sure there isn't a transaction somewhere in the mempool that might make a valid tx invalid, or vice versa. The latency on Flow and the block time on Ethereum are also similar, so it should feel "pretty normal" to an Ethereum dev. There might be more to dig into here... 🤔

I believe that one significant difference here, is the fact that Geth nodes communicate with each other. So all the pending transactions are present in the txpool of each node. That's why they are able to perform state validation checks, by using the txpool's internal state. See: https://github.com/onflow/go-ethereum/blob/master/core/txpool/validation.go#L172-L202. I am unsure as to what other Ethereum nodes do on this case.
But anyway, in the case of Flow EVM GW, even if all nodes were to communicate with each other and create a txpool, the EVM transactions that are executed by Cadence transactions, are left out of that txpool. And this can cause issues in certain cases.

m-Peter · 2024-12-03T14:38:18Z

Anatomy of transaction submission for Geth:

Step 1: eth_sendRawTransaction

SendRawTransaction will add the signed transaction to the transaction pool. The sender is responsible for signing the transaction and using the correct nonce.

Step 2: SubmitTransaction

SubmitTransaction is a helper function that submits tx to txPool and logs a message. The only validations performed in this function, are regarding the transaction fee cap and whether EIP-155 only transactions are allowed or not.

Step 3: SendTx

Simply calls Add.

Step 4: Add

Add enqueues a batch of transactions into the pool if they are valid. Due to the large transaction churn, add may postpone fully integrating the tx to a later point to batch multiple ones together.

Step 5: Add

Add enqueues a batch of transactions into the pool if they are valid. Depending on the local flag, full pricing constraints will or will not be applied. If sync is set, the method will block until all internal maintenance related to the add is finished. Only use this during tests for determinism!

Step 5.1: pool.validateTxBasics

validateTxBasics checks whether a transaction is valid according to the consensus rules, but does not check state-dependent validation such as sufficient balance. This check is meant as an early check which only needs to be performed once, and does not require the pool mutex to be held.

Step 5.2: txpool.ValidateTransaction

ValidateTransaction is a helper method to check whether a transaction is valid according to the consensus rules, but does not check state-dependent validation (balance, nonce, etc). This check is public to allow different transaction pools to check the basic rules without duplicating code and running the risk of missed updates.

Step 5.3: pool.addTxsLocked

addTxsLocked attempts to queue a batch of transactions if they are valid. The transaction pool lock must be held.

Step 5.4: pool.add

add validates a transaction and inserts it into the non-executable queue for later pending promotion and execution. If the transaction is a replacement for an already pending or queued one, it overwrites the previous transaction if its price is higher. If a newly added transaction is marked as local, its sending account will be added to the allowlist, preventing any associated transaction from being dropped out of the pool due to pricing constraints.

Step 5.5: pool.validateTx

validateTx checks whether a transaction is valid according to the consensus rules and adheres to some heuristic limits of the local node (price and size).

Step 5.6: txpool.ValidateTransactionWithState

ValidateTransactionWithState is a helper method to check whether a transaction is valid according to the pool's internal state checks (balance, nonce, gaps). This check is public to allow different transaction pools to check the stateful rules without duplicating code and running the risk of missed updates.
Note: This method does not check for nonce too high, as the LegacyPool implementation allows for arbitrary arrival order:

https://github.com/onflow/go-ethereum/blob/master/core/txpool/legacypool/legacypool.go#L631
https://github.com/onflow/go-ethereum/blob/master/core/txpool/validation.go#L177-L181
https://github.com/onflow/go-ethereum/blob/master/core/txpool/validation.go#L213-L215
However, if for some reason the nonce is indeed too high, the transaction will fail when it gets executed by EVM.

If during Step 5.* there's any validation error, the submitted transaction will not be included at all in the LegacyPool. After that, eth_sendRawTransaction will return an empty hash (common.Hash{}), and the corresponding validation error.

That's what EVM Gateway also does, but in a blocking manner. Meaning that it waits for the Flow transaction to be sealed, so we can extract any validation errors, mostly related to state (nonce, balance etc).

One significant difference here is that EVM Gateway does not have a LegacyPool implementation. Every incoming transaction, will be submitted to the Flow network. For Geth though, a submitted transaction that entered the LegacyPool, might not be executed, and it can also be evicted entirely from the pool. This can be due to pricing/slot constraints and the dynamic fee nature of EVM. What's more, a user that has a pending transaction in LegacyPool, can replace it, by submitting a new transaction, with the same nonce, and a higher gas price.

Another notable difference is how the transaction-related JSON-RPC endpoints behave, for transactions that are still in the LegacyPool.

Logic for eth_getTransactionByHash:

Try to return an already finalized transaction, calling api.b.GetTransaction(ctx, hash)
If no finalized transaction found, try to retrieve it from the pool, assuming that it passed all validation errors and it was included in the pool. Return the pending transaction as the response.
If the first call to get the finalized transaction, returned an error, then return a NewTxIndexingError() to denote transaction is not fully indexed.
Otherwise, return a nil, nil response, to denote that the transaction is not existent from the perspective of the node.

Logic for eth_getTransactionReceipt

Try to return an already finalized transaction, calling api.b.GetTransaction(ctx, hash)
From the above function call, an error will be returned if the transaction is not found, and background indexing for transactions is still in progress. The error is used to indicate the scenario explicitly that the transaction might be reachable shortly. In this case, the JSON-RPC endpoints returns NewTxIndexingError() to denote transaction is not fully indexed. A nil, nil response will be returned if the transaction is not found and background transaction indexing is already finished. The transaction is not existent from the perspective of the node.
Otherwise, fetch the receipt for the given transaction hash, and return its marshalled response.

For EVM Gateway, submitted transactions that passed validation, are not searchable by their transaction hash, via the above endpoints, until the corresponding EVM.TransactionExecuted event is emitted from the Flow network, and ingested by the EVM Gateway. We could however, modify our pool implementation, to achieve this functionality above. Related issue: #544

j1010001 · 2024-12-04T17:29:37Z

We just had a sync on how to approach this - we want to ensure we don't run into sequence number conflicts, because that would fail the EVM Tx execution with users not being able to tell by looking at the EVM Tx hash. We will increase the amount of keys th EVM GW is using for Tx submission and implement #118 to ensure the ley is not reused until the Tx using it was executed.

m-Peter added Improvement Performance labels Nov 12, 2024

m-Peter added this to the Flow-EVM-M2 milestone Nov 12, 2024

m-Peter self-assigned this Nov 12, 2024

This was referenced Nov 25, 2024

Use a constant backoff retry strategy for retrieving the Flow transaction result #672

Merged

eth_sendRawTransaction method slow response #692

Closed

m-Peter mentioned this issue Dec 3, 2024

Enable validation of submitted transactions with local state index #693

Merged

6 tasks

m-Peter mentioned this issue Dec 10, 2024

Rework transaction endpoints to handle pending status #544

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make EVM transaction submission non-blocking #654

Make EVM transaction submission non-blocking #654

m-Peter commented Nov 12, 2024 •

edited

Loading

bluesign commented Nov 13, 2024 •

edited

Loading

m-Peter commented Nov 13, 2024

dete commented Nov 13, 2024

m-Peter commented Nov 14, 2024

m-Peter commented Dec 3, 2024 •

edited

Loading

j1010001 commented Dec 4, 2024

Make EVM transaction submission non-blocking #654

Make EVM transaction submission non-blocking #654

Comments

m-Peter commented Nov 12, 2024 • edited Loading

bluesign commented Nov 13, 2024 • edited Loading

m-Peter commented Nov 13, 2024

dete commented Nov 13, 2024

m-Peter commented Nov 14, 2024

m-Peter commented Dec 3, 2024 • edited Loading

Step 1: eth_sendRawTransaction

Step 2: SubmitTransaction

Step 3: SendTx

Step 4: Add

Step 5: Add

Step 5.1: pool.validateTxBasics

Step 5.2: txpool.ValidateTransaction

Step 5.3: pool.addTxsLocked

Step 5.4: pool.add

Step 5.5: pool.validateTx

Step 5.6: txpool.ValidateTransactionWithState

j1010001 commented Dec 4, 2024

m-Peter commented Nov 12, 2024 •

edited

Loading

bluesign commented Nov 13, 2024 •

edited

Loading

m-Peter commented Dec 3, 2024 •

edited

Loading