Skip to content

Commit 8882184

Browse files
protobenPtival
andauthored
Update ZkLean extractor to work with 64-bit Twist & Shout (#1060)
* Re-enable zklean-extractor and minimally make it compile * Extract T&S instruction MLEs This commit replaces the `RV32I` struct from L&S jolt with the `RV32IMInstruction` struct from T&S Jolt and updates the basic iteration, extraction, and printing functions. This additionally requires adding a derivation for `EnumIter` to `RV32IMinstruction`, in order to allow iteration. * Add evaluation tests for MLEs This also entails temporarily disabling some additional instructions, due to the different profile used for tests. These will be re-enabled as part of https://gitlab-ext.galois.com/jb4/jolt-fork/-/issues/14 * Extract operand interleaving for instruction MLEs The determines how the two input operands of an instruction are combined into a single vector. Due to the requirement of Twist & Shout that each instruction MLE be decomposable into prefix and suffix MLEs, some instructions require their operands to be interleaved, and others concatenated. Jolt determines the operand interleaving using the circuit flags, so we do the same, here. * Clean up some cruft * Make R1CS printing compile with new codebase This removes cross-step constraints and updates the extraction to use the new types, module structure, etc. * Fix incorrectly named test * Remove subtable-extraction cruft * Extract Jolt lookup-table MLEs T&S Jolt associates MLEs with lookup-tables, rather than directly with instructions. This adds a ZkLeanLookupTable type to the extractor and extracts the MLE for each variant of LookupTables from jolt_core. * Replace per-instruction MLEs with references to lookup tables Now that we extract lookup tables, along with their MLEs, we can avoid printing out the MLE for each instruction, and refer to the relevant lookup table, instead. This reduces repetition (since multiple instructions have the same MLE), reduces the number instructions we need to skip (instructions without lookup tables can simply indicate this), and better matches the Jolt codebase. We also remove the tests for instruction-MLE evaluation, since these are now handled in `lookup.rs`. * Extract circuit flags set for each instruction Each instruction sets some of the `R1CSInputs::OpFlags` Boolean R1CS input variables. This extracts those flags. We currently print them in a comment, pending a decision on how to incorporate them into ZkLean. Note that *all* instructions conditionally set the `DoNotUpdateUnexpandedPC` and `InlineSequenceInstruction` flags when they are part of an inline sequence. We do not extract these, since they depend on the trace, not the instruction, itself. * Incorporate `Var` and `Scalar` nodes into parent node in `MleAst` This reduces the number of nodes required in ASTs by putting scalars and variables in the branches of their parent nodes, rather than making separate nodes for them. Since manyvariables and constants such as 0 and 1 tend to be reused many times. The cost of this is that the size of each node is increased, due to needing to contain space for two scalars, as well as additional overhead for enum tags. We work on compressing node sizes next. * Reduce size of atomic MLE AST elements This reduces the size of an AST node by reducing the size of an atomic element (a scalar or a variable). In particular, we * store an array of all unique constants in an AST, so that a scalar can be represented as an index into this array; and * store the register name in the AST, rather than in each variable node. Doing so brings the size of an atom down to 4 bytes and the size of a node down to 10 bytes (due to the overhead of the enum tags). This is sufficient to enable all 32-bit lookup tables (which we also do). * Use current revision of ZkLean in extractor * Increase max heartbeats in extracted Lean file for lookup tables This is necessary because many of the MLEs are too large for Lean to process within the default value of `maxHeartbeats`. We could potentially eliminate the need for this by separating the lookup tables into separate files. * Use `sorry` instead of `panic` for instructions w/o MLEs in extractor Using `panic` causes errors in which Lean can't prove that the corresponding type is inhabited. * Update extractor dependencies * Remove undefined-symbol cruft from extractor package template * Avoid using identifiers for MLEs we currently can't extract * Add comment with old Lasso memops in extractor package template The memops need to be updated to use the new Twist & Shout R1CS input variables. We put them in a comment for now so we can refer to them when we do that. * Use arena allocation for MLE AST nodes in zklean extractor Our approach to extracting lookup-table MLEs is to have an AST data structure that implements `JoltField` and keeps track of the polynomial computation. However, naively, the `Sized` and `Copy` constraints on `JoltField` means that we are unable to use variably sized collections in this data structure, and many ASTs get allocated on the stack over the course of computations. This runs into scalability issues, since stack space is limited. This commit avoids this problem by using a static reference to a locked allocation arena for AST nodes stored on the heap. * Update instruction types for upstream codebase in extractor The instruction type has changed in the upstream codebase. This commit fixes that up in the extractor, post merge. In addition, the subset of instructions that are "supported" by Jolt has changed, due to the move to 64 bit. We fix that here, too. * Update MleAst to fully implement JoltField given upstream changes There are new constraints, methods, associated types, etc. for JoltField in the upstream codebase. This commit ensures that MleAst fully implements the new rendition of the trait. Non-trivial methods that are not needed for extraction are left as stubs. * Update cargo lock file * Update R1CS-constraint data structure in extractor The data structure for R1CS constraints has changed upstream, in particular, allowing R1CS constraints to be expressed in a constant array, rather than a builder pattern. This commit brings the extractor up to date with that data structure. One thing to note is that the terms and constants of a linear combination within an R1CS constraint does not seem to be directly accessible. For that reason, we make the `LC::decompose` function public in this commit. * Make size of extracted lookup tables depend on parameter set * Implement additional traits on MleAst needed by JoltField Post merge, there are some additional constraints on JoltField and its associated types. This commit implements them on MleAst. Since they're not needed for extraction, they're mostly stubs. * Fix extraction of the R1CS constraints for current Jolt This updates our extraction of R1CS constraints to match the current codebase in 2 ways: * coefficients and constant terms in LCs are now `i128` * R1CS constraints are now represented as just `a` and `b` LCs, with the `c` LC being (implicitly) always 0 * introduce common subexpression elimination to reduce output term size * introduce definitions rather than let-bindings to help performance Lean does not handle let-bound sub-expressions efficiently (e.g. a graph), instead performance degrades exponentially in the presence of many shared let-bindings. Using top-level definitions enforce a linear behavior wrt. the number of introduced sub-definitions. For our purposes, this change makes the runtime of compiling the output go from well over 2 years to just about 10 minutes. * add a comment to format_for_lean * Add evaluation tests in Lean for extracted lookup-table MLEs We test the lookup tables in Lean as follows: * Generate 2 * WORD_SIZE random field elements to use as input * For each lookup-table MLE, compute output := evaluate_mle(input) in Rust * Generate a Tests module in Lean with a #guard statement for each MLE that checks mle(input) = output This commit implements the above tests. The field is hard-coded to be the scalar field of BN254 for now, implemented using ZMod p in Lean. This requires showing that p is prime, which we sorry out for now. Proving this or using BN254 from arklib is todo. * have the Display instance print the whole AST as it used to * Avoid hard-coding list of non-inline-sequence instructions in extractor When extracting instructions with lookup tables / circuit flags, we want to avoid inline sequences, because the functions that return these things panic on inline-sequence instructions. Previously, we hard-coded the list of supported instructions in the extractor, which is bad because it requires manual curation. This commit checks the `inline_sequence` instruction and filters out instructionst that return non-empty vectors, instead. * Remove increase to maxHeartbeats in extracted lookup-table module Since the scaling issue with compiling large MLEs is now solved, we no longer need to increase maxHeartbeats. * remove src/ directory from Lean template * Copy BN254.ScalarField from ArkLib for use in extracted Lean tests This commit allows us to use the BN254 scalar field as if we were importing it from ArkLib. For now, there is a Lean4 version mismatch preventing us from actually depending on ArkLib. Once that's fixed, we can remove our copy and substitute in the ArkLib dependency without otherwise changing the Lean code. * Increase maxHeartbeats in Lean for MLE evaluation tests The Lean evaluation tests for the extracted MLEs time out with the default Lean heartbeat limit. This increases it so they can complete. * Move Lean tests into separate module in generated package The Lean tests take a long time to run, since they need to evaluate large polynomials. We move them into a separate module here so that they don't automatically run when running `lake build`. They can now be run with `lake test`. * Re-enable CI tests for extractor and add `lake test` * Fix clippy and formatter warnings in extractor * Remove unused flags module in extractor * Fix extractor output source directory in CI * Remove `lake test` from CI for now; Lean tests are too slow * Removing both Lean compilation and Lean tests from CI for now Compiling the larger 64-bit MLEs in Lean works locally, but overflows the stack in lake on CI. We will remove it from CI for now, until we can find a way to make them more performant. * Rename WORD_SIZE to XLEN in extractor --------- Co-authored-by: Valentin Robert <[email protected]>
1 parent 6949f31 commit 8882184

File tree

29 files changed

+1548
-952
lines changed

29 files changed

+1548
-952
lines changed

.github/workflows/rust.yml

Lines changed: 24 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -171,26 +171,27 @@ jobs:
171171
- name: Run tracer tests
172172
run: cargo nextest run --release -p tracer
173173

174-
# zklean-extractor-tests:
175-
# name: ZkLean extractor tests
176-
# runs-on: ubuntu-latest
177-
# steps:
178-
# - uses: actions/checkout@v5
179-
# - name: Run extractor tests
180-
# working-directory: ./zklean-extractor
181-
# run: cargo test
182-
183-
# compile-extracted-lean:
184-
# name: Compile extracted Lean
185-
# runs-on: ubuntu-latest
186-
# steps:
187-
# - uses: actions/checkout@v5
188-
# - name: Extract Jolt ZkLean package
189-
# working-directory: ./zklean-extractor
190-
# run: cargo run --release -- -o -p jolt-zklean
191-
#
192-
# - name: Run lake build on the extracted model
193-
# uses: leanprover/lean-action@v1
194-
# with:
195-
# lake-package-directory: ./zklean-extractor/jolt-zklean/src
196-
# build: true
174+
zklean-extractor-tests:
175+
name: ZkLean extractor tests
176+
runs-on: ubuntu-latest
177+
steps:
178+
- uses: actions/checkout@v5
179+
- name: Run extractor tests
180+
working-directory: ./zklean-extractor
181+
run: cargo test
182+
183+
compile-extracted-lean:
184+
name: Compile extracted Lean
185+
runs-on: ubuntu-latest
186+
steps:
187+
- uses: actions/checkout@v5
188+
- name: Extract Jolt ZkLean package
189+
working-directory: ./zklean-extractor
190+
run: cargo run --release -- -o -p jolt-zklean
191+
192+
# - name: Run lake build on the extracted model
193+
# uses: leanprover/lean-action@v1
194+
# with:
195+
# lake-package-directory: ./zklean-extractor/jolt-zklean
196+
# build: true
197+
# test: true

Cargo.lock

Lines changed: 172 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ members = [
7373
"examples/merkle-tree/guest",
7474
"examples/hash-bench",
7575
"examples/hash-bench/guest",
76+
"zklean-extractor",
7677
]
7778

7879
[features]

jolt-core/src/zkvm/lookup_table/mod.rs

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ use signed_greater_than_equal::SignedGreaterThanEqualTable;
1919
use signed_less_than::SignedLessThanTable;
2020
use std::marker::Sync;
2121
use strum::EnumCount;
22-
use strum_macros::{EnumCount as EnumCountMacro, EnumIter};
22+
use strum_macros::{EnumCount as EnumCountMacro, EnumIter, IntoStaticStr};
2323
use suffixes::{SuffixEval, Suffixes};
2424
use unsigned_greater_than_equal::UnsignedGreaterThanEqualTable;
2525
use unsigned_less_than::UnsignedLessThanTable;
@@ -117,7 +117,9 @@ pub mod test;
117117

118118
pub const NUM_LOOKUP_TABLES: usize = LookupTables::<32>::COUNT;
119119

120-
#[derive(Copy, Clone, Debug, From, Serialize, Deserialize, EnumIter, EnumCountMacro)]
120+
#[derive(
121+
Copy, Clone, Debug, From, Serialize, Deserialize, EnumIter, EnumCountMacro, IntoStaticStr,
122+
)]
121123
#[repr(u8)]
122124
pub enum LookupTables<const XLEN: usize> {
123125
RangeCheck(RangeCheckTable<XLEN>),

jolt-core/src/zkvm/r1cs/ops.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,7 @@ impl LC {
227227
}
228228

229229
/// Break a LC into (terms, len, const)
230-
const fn decompose(lc: LC) -> ([Term; 5], usize, i128) {
230+
pub const fn decompose(lc: LC) -> ([Term; 5], usize, i128) {
231231
let mut terms = [Term {
232232
input_index: 0,
233233
coeff: 0,

tracer/src/instruction/mod.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -375,7 +375,7 @@ macro_rules! define_rv32im_enums {
375375
(
376376
instructions: [$($instr:ident),* $(,)?]
377377
) => {
378-
#[derive(Debug, IntoStaticStr, From, Clone, Serialize, Deserialize)]
378+
#[derive(Debug, IntoStaticStr, From, Clone, Serialize, Deserialize, EnumIter)]
379379
pub enum Instruction {
380380
/// No-operation instruction (address)
381381
NoOp,

0 commit comments

Comments
 (0)