Skip to content

Conversation

@protoben
Copy link
Contributor

This PR updates the ZkLean extractor for the current version of Jolt. It includes the following changes:

  • All 64-bit Twist & Shout lookup-table MLEs are extracted into Lean. Both the internal representation of the MLEs and their Lean representations have been modified to allow them to scale to the larger polynomials.
  • For each instruction, the corresponding lookup table (if any) and circuit flags are extracted.
  • The R1CS extraction is updated to work with the current handling of constraints in the codebase.
  • For each extracted lookup table polynomial, we emit a test to ensure that it evaluates the same as the corresponding Rust function.
  • Tests for the extractor have been added back into

We make just two changes to jolt-core.

  • We derive IntoStaticStr for LookupTables enum. This helps in automatically emitting relevant names for the lookup-table MLEs in Lean.
  • We make the LC::decompose function pub. This makes it possible for us to extract the R1CS constraints.

This commit replaces the `RV32I` struct from L&S jolt with the
`RV32IMInstruction` struct from T&S Jolt and updates the basic iteration,
extraction, and printing functions. This additionally requires adding a
derivation for `EnumIter` to `RV32IMinstruction`, in order to allow iteration.
This also entails temporarily disabling some additional instructions, due to
the different profile used for tests. These will be re-enabled as part of
https://gitlab-ext.galois.com/jb4/jolt-fork/-/issues/14
The determines how the two input operands of an instruction are combined into a
single vector. Due to the requirement of Twist & Shout that each instruction
MLE be decomposable into prefix and suffix MLEs, some instructions require
their operands to be interleaved, and others concatenated. Jolt determines the
operand interleaving using the circuit flags, so we do the same, here.
…traction'

Resolve "Update instruction MLEs for T&S"

See merge request jb4/jolt-fork!19
This removes cross-step constraints and updates the extraction to use the new
types, module structure, etc.
…ture/ts-extraction'

Resolve "Update R1CS constraint extraction for T&S"

See merge request jb4/jolt-fork!18
T&S Jolt associates MLEs with lookup-tables, rather than directly with
instructions. This adds a ZkLeanLookupTable type to the extractor and extracts
the MLE for each variant of LookupTables from jolt_core.
Now that we extract lookup tables, along with their MLEs, we can avoid printing
out the MLE for each instruction, and refer to the relevant lookup table,
instead. This reduces repetition (since multiple instructions have the same
MLE), reduces the number instructions we need to skip (instructions without
lookup tables can simply indicate this), and better matches the Jolt codebase.
We also remove the tests for instruction-MLE evaluation, since these are now
handled in `lookup.rs`.
…ts-extraction'

Resolve "Extract lookup tables and their MLEs"

See merge request jb4/jolt-fork!21
Each instruction sets some of the `R1CSInputs::OpFlags` Boolean R1CS input
variables. This extracts those flags. We currently print them in a comment,
pending a decision on how to incorporate them into ZkLean.

Note that *all* instructions conditionally set the `DoNotUpdateUnexpandedPC`
and `InlineSequenceInstruction` flags when they are part of an inline sequence.
We do not extract these, since they depend on the trace, not the instruction,
itself.
…o 'feature/ts-extraction'

Resolve "Extract R1CS circuit flags for each instruction"

See merge request jb4/jolt-fork!22
This reduces the number of nodes required in ASTs by putting scalars and
variables in the branches of their parent nodes, rather than making separate
nodes for them. Since manyvariables and constants such as 0 and 1 tend to be
reused many times. The cost of this is that the size of each node is increased,
due to needing to contain space for two scalars, as well as additional overhead
for enum tags. We work on compressing node sizes next.
This reduces the size of an AST node by reducing the size of an atomic element
(a scalar or a variable). In particular, we
* store an array of all unique constants in an AST, so that a scalar can be
  represented as an index into this array; and
* store the register name in the AST, rather than in each variable node.
Doing so brings the size of an atom down to 4 bytes and the size of a node down
to 10 bytes (due to the overhead of the enum tags). This is sufficient to
enable all 32-bit lookup tables (which we also do).
This is necessary because many of the MLEs are too large for Lean to
process within the default value of `maxHeartbeats`. We could potentially
eliminate the need for this by separating the lookup tables into separate
files.
Using `panic` causes errors in which Lean can't prove that the
corresponding type is inhabited.
The memops need to be updated to use the new Twist & Shout R1CS
input variables. We put them in a comment for now so we can refer
to them when we do that.
…ation' into 'feature/ts-extraction'

Resolve "Update pretty-printing to match new ZkLean representation"

See merge request jb4/jolt-fork!25
…' into 'feature/ts-extraction'

Resolve "Prevent zklean-extractor from crashing in debug mode"

See merge request jb4/jolt-fork!23
Our approach to extracting lookup-table MLEs is to have an AST data
structure that implements `JoltField` and keeps track of the
polynomial computation. However, naively, the `Sized` and `Copy`
constraints on `JoltField` means that we are unable to use variably
sized collections in this data structure, and many ASTs get
allocated on the stack over the course of computations. This runs
into scalability issues, since stack space is limited. This commit
avoids this problem by using a static reference to a locked
allocation arena for AST nodes stored on the heap.
protoben and others added 24 commits October 22, 2025 13:14
This updates our extraction of R1CS constraints to match the
current codebase in 2 ways:
* coefficients and constant terms in LCs are now `i128`
* R1CS constraints are now represented as just `a` and `b` LCs,
  with the `c` LC being (implicitly) always 0
Lean does not handle let-bound sub-expressions efficiently (e.g. a
graph), instead performance degrades exponentially in the presence of
many shared let-bindings.

Using top-level definitions enforce a linear behavior wrt. the number of
introduced sub-definitions.  For our purposes, this change makes the
runtime of compiling the output go from well over 2 years to just about
10 minutes.
We test the lookup tables in Lean as follows:
* Generate 2 * WORD_SIZE random field elements to use as input
* For each lookup-table MLE, compute output := evaluate_mle(input)
  in Rust
* Generate a Tests module in Lean with a #guard statement for each
  MLE that checks mle(input) = output
This commit implements the above tests. The field is hard-coded to
be the scalar field of BN254 for now, implemented using ZMod p in
Lean. This requires showing that p is prime, which we sorry out for
now. Proving this or using BN254 from arklib is todo.
…bit-jolt-instructions'

introduce common subexpression elimination to reduce output term size

See merge request jb4/jolt-fork!29
When extracting instructions with lookup tables / circuit flags, we
want to avoid inline sequences, because the functions that return
these things panic on inline-sequence instructions. Previously, we
hard-coded the list of supported instructions in the extractor,
which is bad because it requires manual curation. This commit
checks the `inline_sequence` instruction and filters out
instructionst that return non-empty vectors, instead.
Since the scaling issue with compiling large MLEs is now solved, we
no longer need to increase maxHeartbeats.
…raction'

Resolve "Enable 64-bit Jolt instructions"

See merge request jb4/jolt-fork!27
…ction'

remove src/ directory from Lean template

See merge request jb4/jolt-fork!26
This commit allows us to use the BN254 scalar field as if we were
importing it from ArkLib. For now, there is a Lean4 version
mismatch preventing us from actually depending on ArkLib. Once
that's fixed, we can remove our copy and substitute in the ArkLib
dependency without otherwise changing the Lean code.
The Lean evaluation tests for the extracted MLEs time out with the
default Lean heartbeat limit. This increases it so they can
complete.
The Lean tests take a long time to run, since they need to evaluate
large polynomials. We move them into a separate module here so that
they don't automatically run when running `lake build`. They can
now be run with `lake test`.
…eature/ts-extraction'

Resolve "Add tests for polynomial evaluation in Lean"

See merge request jb4/jolt-fork!28
(inputs : JoltR1CSInputs f)
(mem_reg mem_ram mem_elfaddress mem_bitflags mem_rs1 mem_rs2 mem_rd mem_imm: RAM f) /- We don't currently support storing tuples inside RAM so we've split them into separate ROMs -/
: ZKBuilder f PUnit := do
panic! "TODO: rewrite or extract memory ops for Twist & Shout"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The panic! macro will cause runtime errors when this code is executed. For a more Lean-compatible approach, consider either implementing the proper memory operations for Twist & Shout or using Lean's sorry tactic as a temporary placeholder that allows compilation to proceed. This would better align with Lean's proof system expectations while indicating the code is incomplete.

Suggested change
panic! "TODO: rewrite or extract memory ops for Twist & Shout"
sorry -- TODO: implement memory operations for Twist & Shout

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines +635 to +638
impl<const N: usize> From<ark_ff::BigInt<N>> for MleAst {
fn from(_value: ark_ff::BigInt<N>) -> Self {
unimplemented!("hash unimplemented for MleAst")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message in the From<BigInt> implementation incorrectly refers to hash functionality. It should be updated to "From unimplemented for MleAst" to accurately reflect the unimplemented conversion function rather than hashing.

Suggested change
impl<const N: usize> From<ark_ff::BigInt<N>> for MleAst {
fn from(_value: ark_ff::BigInt<N>) -> Self {
unimplemented!("hash unimplemented for MleAst")
}
impl<const N: usize> From<ark_ff::BigInt<N>> for MleAst {
fn from(_value: ark_ff::BigInt<N>) -> Self {
unimplemented!("From<BigInt> unimplemented for MleAst")
}

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Compiling the larger 64-bit MLEs in Lean works locally, but
overflows the stack in lake on CI. We will remove it from CI for
now, until we can find a way to make them more performant.
Copy link
Collaborator

@0xAndoroid 0xAndoroid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just spotted this one thing

@0xAndoroid
Copy link
Collaborator

@moodlezoup should we merge this?

@moodlezoup moodlezoup merged commit 8882184 into a16z:main Nov 10, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants