Refactor Cpu and Memory use #800
Labels
crate: evm_arithmetization
Anything related to the evm_arithmetization crate.
performance
Performance improvement related changes
After running block
20240052
batching 50 transactions, in the first segment of size 2^{22}, we observe that the memory table is by far the longest. It contains approximately 14M rows, followed by the CPU with ~4M and Keccak with ~0.5M, as shown in the table.Memory usage disaggregated by context is as follows:
90% of the memory usage comes from the
Code
andStack
segments, contributing ~50% and ~40% of the rows, respectively. Notably, ~4M of the memory in theCode
segment is due to the CPU reading the next opcode during execution. The remaining ~3M rows arise primarily from reading the code and RLP-encoded tries for hashing, as detailed in the following table, which disaggregates memory rows by(Segment, Operation)
pairs.The
(0, Some(KeccakGeneral))
operations represent memory reads for hashing the RLP-encoded tries, kernel code, and other hashes.Some optimization ideas
The proposed idea is twofold:
Stack
memory operations.We will do so for different opcodes based on their frequencies, as shown in the table below:
PUSH
The primary purpose of a
PUSH
operation is to provide a value for a subsequent opcode. Typically, this corresponds to one of the "consuming" opcodes in the table, such asBinaryArithmetic(Add)
,Jumpi
,MloadGeneral
,Jump
, etc. SincePUSH
arguments are constants known at compile time, a straightforward optimization involves introducing new constant-specific opcodes, such asBinaryArithmetic(AddConst(const_addend))
,JumpiConst(const_jumpdest)
, orMloadGeneralConst(const_addr)
. This approach would also reduce stack manipulation opcodes likeSWAP
andDUP
by handling constants directly.DUP(0), DUP(1), DUP(2)
We could potentially eliminate many
DUP
operations by introducing "non-consuming" variants of consuming opcodes. The need forDUP
arises because most opcodes consume one or two topmost stack elements, and some values may need to be used multiple times.SWAP(0), SWAP(1), SWAP(2)
andWithout
DUP
operations, the remaining "pure" stack manipulation opcodes would beSWAP(0), SWAP(1), SWAP(2)
. These don't directly interact with memory but modify stack element addresses. An optimization could involve introducing a new opcodeSWAP(i, j)_XXX
for small values ofi, j
, whereXXX
is a frequently used opcode following aSWAP
. The idea is forSWAP(i, j)_XXX
to executeXXX
using inputs offset byi
andj
from the stack top (e.g.,XXX
would becomeSWAP(0,1)_XXX
).Jump
andJumpi
Most jump destinations are known at compile time. Frequently, we observe calls like
%jump(dest)
with a constantdest
orJUMP
operations where the return destination is a statically-known-value at the stack top. Additionally, all opcodes essentially incorporate aJUMP
topc+1
. This could be generalized into opcodes of the formJUMP(b, dest)_XXX
, where (b \in {0, 1}) and the jump destination is (b \cdot (pc + 1) + (1-b) \cdot \text{dest}). For example,XXX
would correspond toJUMP(1, 0)_XXX
.Next steps
This note analyzes the performance of a few transactions in block
20240052
. The next steps would involve extending this analysis to other blocks to validate or disprove the observations. Additionally, a preliminary optimization approach forPush
,Dup
,Swap
, andJump
operations is presented, which collectively account for ~80% of executed opcodes and ~60% ofStack
memory operations.If these optimizations lead to a ~50% reduction in CPU cycles and
Stack
memory operations, overall memory usage could decrease by approximately 50%. However, these ideas remain preliminary and may be either incorrect or suboptimal. Further exploration is needed to refine or identify better strategies.The text was updated successfully, but these errors were encountered: