Update ZkLean extractor to work with 64-bit Twist & Shout #1060

protoben · 2025-10-30T22:11:29Z

This PR updates the ZkLean extractor for the current version of Jolt. It includes the following changes:

All 64-bit Twist & Shout lookup-table MLEs are extracted into Lean. Both the internal representation of the MLEs and their Lean representations have been modified to allow them to scale to the larger polynomials.
For each instruction, the corresponding lookup table (if any) and circuit flags are extracted.
The R1CS extraction is updated to work with the current handling of constraints in the codebase.
For each extracted lookup table polynomial, we emit a test to ensure that it evaluates the same as the corresponding Rust function.
Tests for the extractor have been added back into

We make just two changes to jolt-core.

We derive IntoStaticStr for LookupTables enum. This helps in automatically emitting relevant names for the lookup-table MLEs in Lean.
We make the LC::decompose function pub. This makes it possible for us to extract the R1CS constraints.

This commit replaces the `RV32I` struct from L&S jolt with the `RV32IMInstruction` struct from T&S Jolt and updates the basic iteration, extraction, and printing functions. This additionally requires adding a derivation for `EnumIter` to `RV32IMinstruction`, in order to allow iteration.

This also entails temporarily disabling some additional instructions, due to the different profile used for tests. These will be re-enabled as part of https://gitlab-ext.galois.com/jb4/jolt-fork/-/issues/14

The determines how the two input operands of an instruction are combined into a single vector. Due to the requirement of Twist & Shout that each instruction MLE be decomposable into prefix and suffix MLEs, some instructions require their operands to be interleaved, and others concatenated. Jolt determines the operand interleaving using the circuit flags, so we do the same, here.

…traction' Resolve "Update instruction MLEs for T&S" See merge request jb4/jolt-fork!19

This removes cross-step constraints and updates the extraction to use the new types, module structure, etc.

…ture/ts-extraction' Resolve "Update R1CS constraint extraction for T&S" See merge request jb4/jolt-fork!18

T&S Jolt associates MLEs with lookup-tables, rather than directly with instructions. This adds a ZkLeanLookupTable type to the extractor and extracts the MLE for each variant of LookupTables from jolt_core.

Now that we extract lookup tables, along with their MLEs, we can avoid printing out the MLE for each instruction, and refer to the relevant lookup table, instead. This reduces repetition (since multiple instructions have the same MLE), reduces the number instructions we need to skip (instructions without lookup tables can simply indicate this), and better matches the Jolt codebase. We also remove the tests for instruction-MLE evaluation, since these are now handled in `lookup.rs`.

…ts-extraction' Resolve "Extract lookup tables and their MLEs" See merge request jb4/jolt-fork!21

Each instruction sets some of the `R1CSInputs::OpFlags` Boolean R1CS input variables. This extracts those flags. We currently print them in a comment, pending a decision on how to incorporate them into ZkLean. Note that *all* instructions conditionally set the `DoNotUpdateUnexpandedPC` and `InlineSequenceInstruction` flags when they are part of an inline sequence. We do not extract these, since they depend on the trace, not the instruction, itself.

…o 'feature/ts-extraction' Resolve "Extract R1CS circuit flags for each instruction" See merge request jb4/jolt-fork!22

This reduces the number of nodes required in ASTs by putting scalars and variables in the branches of their parent nodes, rather than making separate nodes for them. Since manyvariables and constants such as 0 and 1 tend to be reused many times. The cost of this is that the size of each node is increased, due to needing to contain space for two scalars, as well as additional overhead for enum tags. We work on compressing node sizes next.

This reduces the size of an AST node by reducing the size of an atomic element (a scalar or a variable). In particular, we * store an array of all unique constants in an AST, so that a scalar can be represented as an index into this array; and * store the register name in the AST, rather than in each variable node. Doing so brings the size of an atom down to 4 bytes and the size of a node down to 10 bytes (due to the overhead of the enum tags). This is sufficient to enable all 32-bit lookup tables (which we also do).

This is necessary because many of the MLEs are too large for Lean to process within the default value of `maxHeartbeats`. We could potentially eliminate the need for this by separating the lookup tables into separate files.

Using `panic` causes errors in which Lean can't prove that the corresponding type is inhabited.

The memops need to be updated to use the new Twist & Shout R1CS input variables. We put them in a comment for now so we can refer to them when we do that.

…ation' into 'feature/ts-extraction' Resolve "Update pretty-printing to match new ZkLean representation" See merge request jb4/jolt-fork!25

…-from-crashing-in-debug-mode

…' into 'feature/ts-extraction' Resolve "Prevent zklean-extractor from crashing in debug mode" See merge request jb4/jolt-fork!23

Our approach to extracting lookup-table MLEs is to have an AST data structure that implements `JoltField` and keeps track of the polynomial computation. However, naively, the `Sized` and `Copy` constraints on `JoltField` means that we are unable to use variably sized collections in this data structure, and many ASTs get allocated on the stack over the course of computations. This runs into scalability issues, since stack space is limited. This commit avoids this problem by using a static reference to a locked allocation arena for AST nodes stored on the heap.

This updates our extraction of R1CS constraints to match the current codebase in 2 ways: * coefficients and constant terms in LCs are now `i128` * R1CS constraints are now represented as just `a` and `b` LCs, with the `c` LC being (implicitly) always 0

Lean does not handle let-bound sub-expressions efficiently (e.g. a graph), instead performance degrades exponentially in the presence of many shared let-bindings. Using top-level definitions enforce a linear behavior wrt. the number of introduced sub-definitions. For our purposes, this change makes the runtime of compiling the output go from well over 2 years to just about 10 minutes.

We test the lookup tables in Lean as follows: * Generate 2 * WORD_SIZE random field elements to use as input * For each lookup-table MLE, compute output := evaluate_mle(input) in Rust * Generate a Tests module in Lean with a #guard statement for each MLE that checks mle(input) = output This commit implements the above tests. The field is hard-coded to be the scalar field of BN254 for now, implemented using ZMod p in Lean. This requires showing that p is prime, which we sorry out for now. Proving this or using BN254 from arklib is todo.

…bit-jolt-instructions' introduce common subexpression elimination to reduce output term size See merge request jb4/jolt-fork!29

…structions

When extracting instructions with lookup tables / circuit flags, we want to avoid inline sequences, because the functions that return these things panic on inline-sequence instructions. Previously, we hard-coded the list of supported instructions in the extractor, which is bad because it requires manual curation. This commit checks the `inline_sequence` instruction and filters out instructionst that return non-empty vectors, instead.

Since the scaling issue with compiling large MLEs is now solved, we no longer need to increase maxHeartbeats.

…raction' Resolve "Enable 64-bit Jolt instructions" See merge request jb4/jolt-fork!27

…ction' remove src/ directory from Lean template See merge request jb4/jolt-fork!26

…-evaluation-in-lean

This commit allows us to use the BN254 scalar field as if we were importing it from ArkLib. For now, there is a Lean4 version mismatch preventing us from actually depending on ArkLib. Once that's fixed, we can remove our copy and substitute in the ArkLib dependency without otherwise changing the Lean code.

The Lean evaluation tests for the extracted MLEs time out with the default Lean heartbeat limit. This increases it so they can complete.

The Lean tests take a long time to run, since they need to evaluate large polynomials. We move them into a separate module here so that they don't automatically run when running `lake build`. They can now be run with `lake test`.

…eature/ts-extraction' Resolve "Add tests for polynomial evaluation in Lean" See merge request jb4/jolt-fork!28

graphite-app · 2025-10-31T19:37:45Z

zklean-extractor/package-template/Jolt/MemOps.lean

+  (inputs : JoltR1CSInputs f)
+  (mem_reg mem_ram mem_elfaddress mem_bitflags mem_rs1 mem_rs2 mem_rd mem_imm: RAM f) /- We don't currently support storing tuples inside RAM so we've split them into separate ROMs -/
+  : ZKBuilder f PUnit := do
+  panic! "TODO: rewrite or extract memory ops for Twist & Shout"


The panic! macro will cause runtime errors when this code is executed. For a more Lean-compatible approach, consider either implementing the proper memory operations for Twist & Shout or using Lean's sorry tactic as a temporary placeholder that allows compilation to proceed. This would better align with Lean's proof system expectations while indicating the code is incomplete.

Suggested change

panic! "TODO: rewrite or extract memory ops for Twist & Shout"

sorry -- TODO: implement memory operations for Twist & Shout

Spotted by Graphite Agent

Is this helpful? React 👍 or 👎 to let us know.

graphite-app · 2025-10-31T19:37:47Z

zklean-extractor/src/mle_ast.rs

+impl<const N: usize> From<ark_ff::BigInt<N>> for MleAst {
+    fn from(_value: ark_ff::BigInt<N>) -> Self {
+        unimplemented!("hash unimplemented for MleAst")
+    }


The error message in the From<BigInt> implementation incorrectly refers to hash functionality. It should be updated to "From unimplemented for MleAst" to accurately reflect the unimplemented conversion function rather than hashing.

Suggested change

impl<const N: usize> From<ark_ff::BigInt<N>> for MleAst {

fn from(_value: ark_ff::BigInt<N>) -> Self {

unimplemented!("hash unimplemented for MleAst")

}

impl<const N: usize> From<ark_ff::BigInt<N>> for MleAst {

fn from(_value: ark_ff::BigInt<N>) -> Self {

unimplemented!("From<BigInt> unimplemented for MleAst")

}

Spotted by Graphite Agent

Is this helpful? React 👍 or 👎 to let us know.

Compiling the larger 64-bit MLEs in Lean works locally, but overflows the stack in lake on CI. We will remove it from CI for now, until we can find a way to make them more performant.

0xAndoroid

Just spotted this one thing

zklean-extractor/src/constants.rs

0xAndoroid · 2025-11-10T18:40:26Z

@moodlezoup should we merge this?

protoben added 30 commits September 19, 2025 15:49

Re-enable zklean-extractor and minimally make it compile

55f0587

Merge remote-tracking branch 'a16z/main' into feature/ts-extraction

cee1388

Add evaluation tests for MLEs

bf75ed5

This also entails temporarily disabling some additional instructions, due to the different profile used for tests. These will be re-enabled as part of https://gitlab-ext.galois.com/jb4/jolt-fork/-/issues/14

Clean up some cruft

fd1aabe

Merge branch '22-update-instruction-mles-for-t-s' into 'feature/ts-ex…

263b345

…traction' Resolve "Update instruction MLEs for T&S" See merge request jb4/jolt-fork!19

Make R1CS printing compile with new codebase

525f9dc

This removes cross-step constraints and updates the extraction to use the new types, module structure, etc.

Merge branch '20-update-r1cs-constraint-extraction-for-t-s' into 'fea…

c755bde

…ture/ts-extraction' Resolve "Update R1CS constraint extraction for T&S" See merge request jb4/jolt-fork!18

Fix incorrectly named test

d5b52c4

Remove subtable-extraction cruft

759e2d1

Extract Jolt lookup-table MLEs

518f642

T&S Jolt associates MLEs with lookup-tables, rather than directly with instructions. This adds a ZkLeanLookupTable type to the extractor and extracts the MLE for each variant of LookupTables from jolt_core.

Merge branch '28-extract-lookup-tables-and-their-mles' into 'feature/…

ced71f8

…ts-extraction' Resolve "Extract lookup tables and their MLEs" See merge request jb4/jolt-fork!21

Merge branch '29-extract-r1cs-circuit-flags-for-each-instruction' int…

7ff0bb6

…o 'feature/ts-extraction' Resolve "Extract R1CS circuit flags for each instruction" See merge request jb4/jolt-fork!22

Use current revision of ZkLean in extractor

c1074c9

Use sorry instead of panic for instructions w/o MLEs in extractor

d50eb62

Using `panic` causes errors in which Lean can't prove that the corresponding type is inhabited.

Update extractor dependencies

46ad8d6

Remove undefined-symbol cruft from extractor package template

c5543bc

Avoid using identifiers for MLEs we currently can't extract

24d82d8

Add comment with old Lasso memops in extractor package template

3ab822c

The memops need to be updated to use the new Twist & Shout R1CS input variables. We put them in a comment for now so we can refer to them when we do that.

Merge branch '23-update-pretty-printing-to-match-new-zklean-represent…

d74fa9b

…ation' into 'feature/ts-extraction' Resolve "Update pretty-printing to match new ZkLean representation" See merge request jb4/jolt-fork!25

Merge branch 'feature/ts-extraction' into 14-prevent-zklean-extractor…

d593f2c

…-from-crashing-in-debug-mode

Merge branch '14-prevent-zklean-extractor-from-crashing-in-debug-mode…

a8fc461

…' into 'feature/ts-extraction' Resolve "Prevent zklean-extractor from crashing in debug mode" See merge request jb4/jolt-fork!23

Merge branch 'upstream' into 4-enable-64-bit-jolt-instructions

868f76a

protoben and others added 24 commits October 22, 2025 13:14

introduce common subexpression elimination to reduce output term size

c09e67e

add a comment to format_for_lean

f37d2e2

have the Display instance print the whole AST as it used to

22aa1eb

Merge branch 'vr/common-subexpression-elimination' into '4-enable-64-…

ad58b87

…bit-jolt-instructions' introduce common subexpression elimination to reduce output term size See merge request jb4/jolt-fork!29

Merge remote-tracking branch 'a16z/main' into 4-enable-64-bit-jolt-in…

3c46c3f

…structions

Remove increase to maxHeartbeats in extracted lookup-table module

e7e98d5

Since the scaling issue with compiling large MLEs is now solved, we no longer need to increase maxHeartbeats.

Merge branch '4-enable-64-bit-jolt-instructions' into 'feature/ts-ext…

035abe4

…raction' Resolve "Enable 64-bit Jolt instructions" See merge request jb4/jolt-fork!27

remove src/ directory from Lean template

a5a0e2c

Merge branch 'vr/reorganize-project-structure' into 'feature/ts-extra…

fe757e2

…ction' remove src/ directory from Lean template See merge request jb4/jolt-fork!26

Merge branch 'feature/ts-extraction' into 17-add-tests-for-polynomial…

829467b

…-evaluation-in-lean

Increase maxHeartbeats in Lean for MLE evaluation tests

9bf6897

The Lean evaluation tests for the extracted MLEs time out with the default Lean heartbeat limit. This increases it so they can complete.

Re-enable CI tests for extractor and add lake test

457afde

Merge branch '17-add-tests-for-polynomial-evaluation-in-lean' into 'f…

e69fe9b

…eature/ts-extraction' Resolve "Add tests for polynomial evaluation in Lean" See merge request jb4/jolt-fork!28

Merge remote-tracking branch 'a16z/main' into feature/ts-extraction

1e3fb6f

Fix clippy and formatter warnings in extractor

abb819c

Remove unused flags module in extractor

be67950

Fix extractor output source directory in CI

e3c835a

Remove lake test from CI for now; Lean tests are too slow

d73f079

graphite-app bot reviewed Oct 31, 2025

View reviewed changes

Removing both Lean compilation and Lean tests from CI for now

2d7742e

Compiling the larger 64-bit MLEs in Lean works locally, but overflows the stack in lake on CI. We will remove it from CI for now, until we can find a way to make them more performant.

0xAndoroid reviewed Nov 4, 2025

View reviewed changes

zklean-extractor/src/constants.rs Outdated Show resolved Hide resolved

Rename WORD_SIZE to XLEN in extractor

b8810b7

moodlezoup merged commit 8882184 into a16z:main Nov 10, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update ZkLean extractor to work with 64-bit Twist & Shout #1060

Update ZkLean extractor to work with 64-bit Twist & Shout #1060

Uh oh!

protoben commented Oct 30, 2025

Uh oh!

graphite-app bot Oct 31, 2025

Uh oh!

graphite-app bot Oct 31, 2025

Uh oh!

0xAndoroid left a comment

Uh oh!

Uh oh!

0xAndoroid commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	panic! "TODO: rewrite or extract memory ops for Twist & Shout"
	sorry -- TODO: implement memory operations for Twist & Shout

Update ZkLean extractor to work with 64-bit Twist & Shout #1060

Update ZkLean extractor to work with 64-bit Twist & Shout #1060

Uh oh!

Conversation

protoben commented Oct 30, 2025

Uh oh!

graphite-app bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

graphite-app bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

0xAndoroid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

0xAndoroid commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants