Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make EVM transaction submission non-blocking #654

Open
m-Peter opened this issue Nov 12, 2024 · 6 comments
Open

Make EVM transaction submission non-blocking #654

m-Peter opened this issue Nov 12, 2024 · 6 comments

Comments

@m-Peter
Copy link
Collaborator

m-Peter commented Nov 12, 2024

Depends on: #586

Currently, whenever a transaction is submitted to the EVM Gateway, by calling the eth_sendRawTransaction JSON-RPC endpoint, under the hood we submit a Flow transaction to the Flow network, which wraps a call to EVM.run.

The eth_sendRawTransaction is a blocking operation, as it awaits to receive the status of the submitted Flow transaction, until it becomes sealed (see: https://github.com/onflow/flow-evm-gateway/blob/main/services/requester/pool.go#L71-L78). This is done with a retriable operation, using a Fibonacci back-off mechanism. And it has been reported to take some time, on average 20-25 seconds to response (as per community devs).

The rationale behind this is the fact that only EVM transactions with a failed or successful status are included in EVM blocks (see: https://github.com/onflow/flow-go/blob/master/fvm/evm/stdlib/contract.cdc#L318-L329).

However, the eth_sendRawTransaction endpoint has to return the EVM tx hash, or an error message, in the case of invalid transactions. Examples of invalid transactions are cases of nonce mismatch, or insufficient balance. The EVM Gateway had no other way of getting back these validation errors, other than waiting for the Flow transaction to seal.

With the implementation of the local state index, we now have access to the nonce and balance and we can perform all validation checks, removing the need to submit transactions which are known to be invalid.
The list of possible validation errors can be found here: https://github.com/onflow/flow-go/blob/master/fvm/evm/types/codeFinder.go#L11-L54
The validation checks performed by Geth, can be found here: https://github.com/onflow/go-ethereum/blob/master/core/state_transition.go#L374-L382.

If we move all the validation checks on the EVM Gateway side, we can remove the blocking nature of eth_sendRawTransaction, thus making it more efficient.

Discord thread: https://discord.com/channels/613813861610684416/1162086721471647874/1305607307157311519

@m-Peter m-Peter added this to the Flow-EVM-M2 milestone Nov 12, 2024
@m-Peter m-Peter self-assigned this Nov 12, 2024
@bluesign
Copy link

bluesign commented Nov 13, 2024

Fibonacci backoff here was the mistake I guess, Fibanocci backoff assumes uniform availability ( chance of seal in R1 == Rn )

But chance of seal in Flow is actually starting from 0 ( R1=0 ) and somehow increasing over time ( Rn+1 > Rn )

I think for temporary measure starting check from T+d ( where d = educated guess of seal time ( 10 seconds for example ) ) solves the problem.

@m-Peter
Copy link
Collaborator Author

m-Peter commented Nov 13, 2024

Fibonacci backoff here was the mistake I guess, Fibanocci backoff assumes uniform availability ( chance of seal in R1 == Rn )

But chance of seal in Flow is actually starting from 0 ( R1=0 ) and somehow increasing over time ( Rn+1 > Rn )

I think for temporary measure starting check from T+d ( where d = educated guess of seal time ( 10 seconds for example ) ) solves the problem.

Yeah, it seems that the Fibonacci backoff is causing the "large" gaps between retries. It could worth the effort to try out your suggestion, as a temporary measure, before removing the need for awaiting the transaction result altogether. Thanks for the suggestion 🙏

@dete
Copy link

dete commented Nov 13, 2024

With the implementation of the local state index, we now have access to the nonce and balance and we can perform all validation checks

Be careful with this! The AN will always be looking at state which is 5-15 seconds behind the state in which the transaction will be executed. For example, I could submit a tx with nonce 5 to one GW, then two seconds later, submit a tx with nonce 6 to another GW. This is highly likely (but not guaranteed!) to succeed if allowed to happen. Less likely (but also possible) a tx that looks like it doesn't have enough balance to run might be waiting for a deposit that is currently in flight.

What's odd is that I don't see how this is any different between Ethereum and Flow. On Ethereum, the node I'm talking to also has a delayed view of the state, and can't be sure there isn't a transaction somewhere in the mempool that might make a valid tx invalid, or vice versa. The latency on Flow and the block time on Ethereum are also similar, so it should feel "pretty normal" to an Ethereum dev. There might be more to dig into here... 🤔

@m-Peter
Copy link
Collaborator Author

m-Peter commented Nov 14, 2024

Be careful with this! The AN will always be looking at state which is 5-15 seconds behind the state in which the transaction will be executed. For example, I could submit a tx with nonce 5 to one GW, then two seconds later, submit a tx with nonce 6 to another GW. This is highly likely (but not guaranteed!) to succeed if allowed to happen. Less likely (but also possible) a tx that looks like it doesn't have enough balance to run might be waiting for a deposit that is currently in flight.

You have a good point here 👌 . I'll also add the case where some tool out there simply by-passes the GW altogether, by using Cadence transactions which wrap EVM.run(tx: hexEncodedTx, ...). Which means that every GW instance, relies entirely on AN, to get the complete EVM state. It can't just rely on the transactions that go directly through the GW instances.

What's odd is that I don't see how this is any different between Ethereum and Flow. On Ethereum, the node I'm talking to also has a delayed view of the state, and can't be sure there isn't a transaction somewhere in the mempool that might make a valid tx invalid, or vice versa. The latency on Flow and the block time on Ethereum are also similar, so it should feel "pretty normal" to an Ethereum dev. There might be more to dig into here... 🤔

I believe that one significant difference here, is the fact that Geth nodes communicate with each other. So all the pending transactions are present in the txpool of each node. That's why they are able to perform state validation checks, by using the txpool's internal state. See: https://github.com/onflow/go-ethereum/blob/master/core/txpool/validation.go#L172-L202. I am unsure as to what other Ethereum nodes do on this case.
But anyway, in the case of Flow EVM GW, even if all nodes were to communicate with each other and create a txpool, the EVM transactions that are executed by Cadence transactions, are left out of that txpool. And this can cause issues in certain cases.

@m-Peter
Copy link
Collaborator Author

m-Peter commented Dec 3, 2024

Anatomy of transaction submission for Geth:

Step 1: eth_sendRawTransaction


SendRawTransaction will add the signed transaction to the transaction pool. The sender is responsible for signing the transaction and using the correct nonce.

Step 2: SubmitTransaction


SubmitTransaction is a helper function that submits tx to txPool and logs a message. The only validations performed in this function, are regarding the transaction fee cap and whether EIP-155 only transactions are allowed or not.

Step 3: SendTx


Simply calls Add.

Step 4: Add


Add enqueues a batch of transactions into the pool if they are valid. Due to the large transaction churn, add may postpone fully integrating the tx to a later point to batch multiple ones together.

Step 5: Add


Add enqueues a batch of transactions into the pool if they are valid. Depending on the local flag, full pricing constraints will or will not be applied. If sync is set, the method will block until all internal maintenance related to the add is finished. Only use this during tests for determinism!

Step 5.1: pool.validateTxBasics


validateTxBasics checks whether a transaction is valid according to the consensus rules, but does not check state-dependent validation such as sufficient balance. This check is meant as an early check which only needs to be performed once, and does not require the pool mutex to be held.

Step 5.2: txpool.ValidateTransaction


ValidateTransaction is a helper method to check whether a transaction is valid according to the consensus rules, but does not check state-dependent validation (balance, nonce, etc). This check is public to allow different transaction pools to check the basic rules without duplicating code and running the risk of missed updates.

Step 5.3: pool.addTxsLocked


addTxsLocked attempts to queue a batch of transactions if they are valid. The transaction pool lock must be held.

Step 5.4: pool.add


add validates a transaction and inserts it into the non-executable queue for later pending promotion and execution. If the transaction is a replacement for an already pending or queued one, it overwrites the previous transaction if its price is higher. If a newly added transaction is marked as local, its sending account will be added to the allowlist, preventing any associated transaction from being dropped out of the pool due to pricing constraints.

Step 5.5: pool.validateTx


validateTx checks whether a transaction is valid according to the consensus rules and adheres to some heuristic limits of the local node (price and size).

Step 5.6: txpool.ValidateTransactionWithState


ValidateTransactionWithState is a helper method to check whether a transaction is valid according to the pool's internal state checks (balance, nonce, gaps). This check is public to allow different transaction pools to check the stateful rules without duplicating code and running the risk of missed updates.
Note: This method does not check for nonce too high, as the LegacyPool implementation allows for arbitrary arrival order:

If during Step 5.* there's any validation error, the submitted transaction will not be included at all in the LegacyPool. After that, eth_sendRawTransaction will return an empty hash (common.Hash{}), and the corresponding validation error.

That's what EVM Gateway also does, but in a blocking manner. Meaning that it waits for the Flow transaction to be sealed, so we can extract any validation errors, mostly related to state (nonce, balance etc).

One significant difference here is that EVM Gateway does not have a LegacyPool implementation. Every incoming transaction, will be submitted to the Flow network. For Geth though, a submitted transaction that entered the LegacyPool, might not be executed, and it can also be evicted entirely from the pool. This can be due to pricing/slot constraints and the dynamic fee nature of EVM. What's more, a user that has a pending transaction in LegacyPool, can replace it, by submitting a new transaction, with the same nonce, and a higher gas price.

Another notable difference is how the transaction-related JSON-RPC endpoints behave, for transactions that are still in the LegacyPool.

Logic for eth_getTransactionByHash:

  1. Try to return an already finalized transaction, calling api.b.GetTransaction(ctx, hash)
  2. If no finalized transaction found, try to retrieve it from the pool, assuming that it passed all validation errors and it was included in the pool. Return the pending transaction as the response.
  3. If the first call to get the finalized transaction, returned an error, then return a NewTxIndexingError() to denote transaction is not fully indexed.
  4. Otherwise, return a nil, nil response, to denote that the transaction is not existent from the perspective of the node.

Logic for eth_getTransactionReceipt

  1. Try to return an already finalized transaction, calling api.b.GetTransaction(ctx, hash)
  2. From the above function call, an error will be returned if the transaction is not found, and background indexing for transactions is still in progress. The error is used to indicate the scenario explicitly that the transaction might be reachable shortly. In this case, the JSON-RPC endpoints returns NewTxIndexingError() to denote transaction is not fully indexed. A nil, nil response will be returned if the transaction is not found and background transaction indexing is already finished. The transaction is not existent from the perspective of the node.
  3. Otherwise, fetch the receipt for the given transaction hash, and return its marshalled response.

For EVM Gateway, submitted transactions that passed validation, are not searchable by their transaction hash, via the above endpoints, until the corresponding EVM.TransactionExecuted event is emitted from the Flow network, and ingested by the EVM Gateway. We could however, modify our pool implementation, to achieve this functionality above. Related issue: #544

@j1010001
Copy link
Member

j1010001 commented Dec 4, 2024

We just had a sync on how to approach this - we want to ensure we don't run into sequence number conflicts, because that would fail the EVM Tx execution with users not being able to tell by looking at the EVM Tx hash. We will increase the amount of keys th EVM GW is using for Tx submission and implement #118 to ensure the ley is not reused until the Tx using it was executed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants