feat: permutation argument optimizations #10960

ledwards2225 · 2024-12-24T14:26:32Z

A handful of optimizations for the large ambient trace setting, mostly to do with the grand product argument. Total savings is about 1s on the "17 in 20" benchmark.

Only perform computation for the grand product on active rows of the trace. This means (1) only setting the values of sigma/id on the active rows (they remain zero elsewhere since those values don't contribute to the grand product anyway). And (2) only compute the grand product at active rows then populate the constant regions as a final step. These are both facilitated by constructing a vector active_row_idxs which explicitly contains the indices of the active rows. This makes it easier to multithread and is much more efficient than looping over the entire domain and using something like check_is_active() which itself has low overhead but results in huge disparities in the distribution of actual work across threads.
Replace a default initialized std::vector in PG with a Polynomial simply to take advantage of the optimized constructor

Branch "17 in 20" benchmark

ClientIVCBench/Full/6 20075 ms 17763 ms

Master "17 in 20" benchmark

ClientIVCBench/Full/6 21054 ms 18395 ms

The conventional benchmark ("19 in 19") shows a very minor improvement, as expected:

Branch:

ClientIVCBench/Full/6 22231 ms 19857 ms

Master:

ClientIVCBench/Full/6 22505 ms 19536 ms

ledwards2225 · 2025-01-07T14:57:45Z

barretenberg/cpp/src/barretenberg/protogalaxy/protogalaxy_prover_internal.hpp


    {

        PROFILE_THIS_NAME("ProtogalaxyProver_::compute_row_evaluations");

        const size_t polynomial_size = polynomials.get_polynomial_size();
-        std::vector<FF> aggregated_relation_evaluations(polynomial_size);
+        Polynomial<FF> aggregated_relation_evaluations(polynomial_size);


Using Polynomial here instead of vector simply because it has a much more efficient constructor for initializing all of its elements to zero

ledwards2225 · 2025-01-08T16:48:06Z

barretenberg/cpp/src/barretenberg/plonk_honk_shared/composer/permutation_lib.hpp

-            current_permutation_poly.at(i) = FF(current_row_idx + num_gates * current_col_idx);
-        }
-        ITERATE_OVER_DOMAIN_END;
+        parallel_for(thread_data.num_threads, [&](size_t j) {


This loop now iterates over only the active domain instead of the entire poly domain. Prior to this change, the sigma/id polynomials took non-zero values across the entire domain. Now, they are non-zero only in the active regions of the trace and 0 elsewhere (previously we had sigma_i == id_i in these regions). These values don't contribute to the computation of the grand product anyway so there's no reason to compute them.

ledwards2225 · 2025-01-08T16:53:37Z

barretenberg/cpp/src/barretenberg/ultra_honk/mega_honk.test.cpp

@@ -390,7 +396,12 @@ TYPED_TEST(MegaHonkTests, PolySwap)
    auto proving_key_2 = std::make_shared<typename TestFixture::DeciderProvingKey>(builder_copy, trace_settings);

    // Tamper with the polys of pkey 1 in such a way that verification should fail
-    proving_key_1->proving_key.polynomials.w_l.at(5) = 10;
+    for (size_t i = 0; i < proving_key_1->proving_key.circuit_size; ++i) {


this didn't need to change, I just wanted to make sure that tampering was a bit more robust. In theory setting a wire value at a row where no gates/copy constraints are active could still (correctly) lead to a verifiable proof

maramihali · 2025-01-09T10:45:21Z

would be nice to have the timings for what happens in a 2^19 trace as well :)

maramihali

Looks good, small comments/suggestions and a question

maramihali · 2025-01-09T12:31:05Z

barretenberg/cpp/src/barretenberg/flavor/flavor.hpp

@@ -98,6 +98,19 @@ class PrecomputedEntitiesBase {
    uint64_t log_circuit_size;
    uint64_t num_public_inputs;
 };
+// Specifies the regions of the execution trace containing non-trivial wire values
+struct ActiveRegionData {


don't these ranges need to be non-overlapping and in increasing order? maybe there should be a comment of some sort of check

its a good point. I could add a check on add_range that the input has start >= the previous end. To be safe I suppose I'd also want to make the members private and add getters

maramihali · 2025-01-09T12:36:23Z

barretenberg/cpp/src/barretenberg/plonk_honk_shared/composer/permutation_lib.hpp

+            const size_t end = thread_data.end[j];
+            for (size_t i = start; i < end; ++i) {
+                size_t poly_idx = proving_key->active_region_data.idxs[i];
+                auto idx = static_cast<ptrdiff_t>(poly_idx);


why is this cast needed?

also this can be a const and the one above too

The cast is needed since row_idx has type std::shared_ptr<uint32_t[]> which can only be indexed with a ptrdiff_t. (Note this isn't a change introduced in this PR)

maramihali · 2025-01-09T12:38:38Z

barretenberg/cpp/src/barretenberg/plonk_honk_shared/library/grand_product_library.hpp

 {
    PROFILE_THIS_NAME("compute_grand_product");

    using FF = typename Flavor::FF;
    using Polynomial = typename Flavor::Polynomial;
    using Accumulator = std::tuple_element_t<0, typename GrandProdRelation::SumcheckArrayOfValuesOverSubrelations>;

+    const bool active_region_specified = !active_region_data.ranges.empty();


has_active_regions?

hah I started with that but thought it was misleading because if false it implies that there are NO active regions when really its just that they are implicit and haven't been specified. You're probably right tho that has_active_regions is more clear

maramihali · 2025-01-09T12:41:48Z

barretenberg/cpp/src/barretenberg/plonk_honk_shared/library/grand_product_library.hpp

-                    row, relation_parameters);
+            // TODO(https://github.com/AztecProtocol/barretenberg/issues/940):consider avoiding get_row if possible.
+            auto row_idx = get_active_range_poly_idx(i);
+            if constexpr (IsUltraFlavor<Flavor>) {


if constexpr (!IsPlonkFlavor<Flavor>) would make this more readable I think

That's not quite the same thing though because this code is also used by the ECCVM/Translator which need to be excluded. I think this just comes down to the fact that we need better concepts. Probably isUltraOrMegaHonk

I suppose I could just add methods to the ECCVM/Trans flavors that just call get_row from get_row_for_permutation_arg. Not sure what's better

maramihali · 2025-01-09T12:42:53Z

barretenberg/cpp/src/barretenberg/plonk_honk_shared/library/grand_product_library.hpp

        for (size_t i = start; i < end; ++i) {
-            grand_product_polynomial.at(i + 1) = numerator[i] * denominator[i];
+            auto poly_idx = get_active_range_poly_idx(i + 1);


const size_t

maramihali · 2025-01-09T12:43:43Z

barretenberg/cpp/src/barretenberg/plonk_honk_shared/library/grand_product_library.hpp

+                for (size_t j = 0; j < active_region_data.ranges.size() - 1; ++j) {
+                    size_t previous_range_end = active_region_data.ranges[j].second;
+                    size_t next_range_start = active_region_data.ranges[j + 1].first;
+                    // If the index falls in an inactive region, set its value


This comment seems incomplete

haha its not but I do see what you mean. I reordered to make it sound more natural

* master: (287 commits) feat: Sync from noir (#11051) chore(docs): Update tx concepts page (#10947) chore(docs): Edit Aztec.nr Guide section (#10866) chore: test:e2e defaults to no-docker (#10966) chore(avm): improve column stats (#11135) chore: Sanity checking of proving job IDs (#11134) feat: permutation argument optimizations (#10960) feat: single tx block root rollup (#11096) refactor: prover db config (#11126) feat: monitor event loop lag (#11127) chore: Greater stability at 1TPS (#10981) chore: Jest reporters for CI (#11125) fix: Sequencer times out L1 tx at end of L2 slot (#11112) feat: browser chunking (#11102) fix: Added start/stop guards to running promise and serial queue (#11120) fix: Don't retransmit txs upon node restart (#11123) fix: Prover node aborts execution at epoch end (#11111) feat: blob sink in sandbox without extra process (#11032) chore: log number of instructions executed for call in AVM. Misc fix. (#11110) git subrepo push --branch=master noir-projects/aztec-nr ...

ledwards2225 added 6 commits December 23, 2024 17:10

basic computations skipping model working in GP comp

64317bc

WiP things working before switch to active idxs loops

0ab2078

using active idxs for GP step 1

35d1c8d

fix

418f7bc

dont do any computation for num/denom in inactive regions

09ef41a

Merge branch 'master' into lde/perm_opt

a258127

ledwards2225 marked this pull request as ready for review December 24, 2024 14:26

ledwards2225 added 9 commits December 24, 2024 16:33

fix build

72b215f

fix PG tests

c36feaa

clean debug code to fix some tests

bd8a511

Merge branch 'master' into lde/perm_opt

dd50b47

test revision of gp method

d9b43bb

remove debug code for ci

deb1d65

optimized version seems to be working, cleanup needed

339a5c9

some fixes, see what fails

5471504

fix for client ivc

4d7ea03

ledwards2225 self-assigned this Jan 3, 2025

ledwards2225 added 9 commits January 3, 2025 22:40

correct ivc structure

7f348aa

Merge branch 'master' into lde/perm_opt

9736925

some cleanup

8911c2f

clean and regularize

46e378c

fix index error in thread method

d0ea21a

clarify and clean

45156f7

active regions model working

0ebd607

Merge branch 'master' into lde/perm_opt

b2539a7

more cleanup

bf1394d

ledwards2225 commented Jan 7, 2025

View reviewed changes

ledwards2225 added 2 commits January 7, 2025 19:37

clean and remove debug code

b104bc6

remove problematic assert

e702d00

ledwards2225 commented Jan 8, 2025

View reviewed changes

ledwards2225 added 2 commits January 8, 2025 17:20

clean

d34fe9c

Merge branch 'master' into lde/perm_opt

cd6e7b6

ledwards2225 requested a review from maramihali January 8, 2025 19:58

maramihali approved these changes Jan 9, 2025

View reviewed changes

make active region class more robust and constify some things

e442a51

ledwards2225 merged commit de99603 into master Jan 9, 2025
25 checks passed

ledwards2225 deleted the lde/perm_opt branch January 9, 2025 17:33

AztecBot mentioned this pull request Jan 9, 2025

chore(master): Release 0.70.0 #11107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: permutation argument optimizations #10960

feat: permutation argument optimizations #10960

ledwards2225 commented Dec 24, 2024 •

edited

Loading

ledwards2225 Jan 7, 2025

ledwards2225 Jan 8, 2025

ledwards2225 Jan 8, 2025

maramihali commented Jan 9, 2025

maramihali left a comment •

edited

Loading

maramihali Jan 9, 2025

ledwards2225 Jan 9, 2025

maramihali Jan 9, 2025

maramihali Jan 9, 2025

ledwards2225 Jan 9, 2025

maramihali Jan 9, 2025

ledwards2225 Jan 9, 2025

maramihali Jan 9, 2025

ledwards2225 Jan 9, 2025

ledwards2225 Jan 9, 2025

maramihali Jan 9, 2025

maramihali Jan 9, 2025

ledwards2225 Jan 9, 2025

feat: permutation argument optimizations #10960

feat: permutation argument optimizations #10960

Conversation

ledwards2225 commented Dec 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maramihali commented Jan 9, 2025

maramihali left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ledwards2225 commented Dec 24, 2024 •

edited

Loading

maramihali left a comment •

edited

Loading