-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: permutation argument optimizations #10960
Conversation
|
||
{ | ||
|
||
PROFILE_THIS_NAME("ProtogalaxyProver_::compute_row_evaluations"); | ||
|
||
const size_t polynomial_size = polynomials.get_polynomial_size(); | ||
std::vector<FF> aggregated_relation_evaluations(polynomial_size); | ||
Polynomial<FF> aggregated_relation_evaluations(polynomial_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Polynomial here instead of vector simply because it has a much more efficient constructor for initializing all of its elements to zero
current_permutation_poly.at(i) = FF(current_row_idx + num_gates * current_col_idx); | ||
} | ||
ITERATE_OVER_DOMAIN_END; | ||
parallel_for(thread_data.num_threads, [&](size_t j) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop now iterates over only the active domain instead of the entire poly domain. Prior to this change, the sigma/id polynomials took non-zero values across the entire domain. Now, they are non-zero only in the active regions of the trace and 0 elsewhere (previously we had sigma_i == id_i in these regions). These values don't contribute to the computation of the grand product anyway so there's no reason to compute them.
@@ -390,7 +396,12 @@ TYPED_TEST(MegaHonkTests, PolySwap) | |||
auto proving_key_2 = std::make_shared<typename TestFixture::DeciderProvingKey>(builder_copy, trace_settings); | |||
|
|||
// Tamper with the polys of pkey 1 in such a way that verification should fail | |||
proving_key_1->proving_key.polynomials.w_l.at(5) = 10; | |||
for (size_t i = 0; i < proving_key_1->proving_key.circuit_size; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this didn't need to change, I just wanted to make sure that tampering was a bit more robust. In theory setting a wire value at a row where no gates/copy constraints are active could still (correctly) lead to a verifiable proof
would be nice to have the timings for what happens in a 2^19 trace as well :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, small comments/suggestions and a question
@@ -98,6 +98,19 @@ class PrecomputedEntitiesBase { | |||
uint64_t log_circuit_size; | |||
uint64_t num_public_inputs; | |||
}; | |||
// Specifies the regions of the execution trace containing non-trivial wire values | |||
struct ActiveRegionData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't these ranges need to be non-overlapping and in increasing order? maybe there should be a comment of some sort of check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its a good point. I could add a check on add_range that the input has start >= the previous end. To be safe I suppose I'd also want to make the members private and add getters
const size_t end = thread_data.end[j]; | ||
for (size_t i = start; i < end; ++i) { | ||
size_t poly_idx = proving_key->active_region_data.idxs[i]; | ||
auto idx = static_cast<ptrdiff_t>(poly_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this cast needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also this can be a const and the one above too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cast is needed since row_idx
has type std::shared_ptr<uint32_t[]>
which can only be indexed with a ptrdiff_t
. (Note this isn't a change introduced in this PR)
{ | ||
PROFILE_THIS_NAME("compute_grand_product"); | ||
|
||
using FF = typename Flavor::FF; | ||
using Polynomial = typename Flavor::Polynomial; | ||
using Accumulator = std::tuple_element_t<0, typename GrandProdRelation::SumcheckArrayOfValuesOverSubrelations>; | ||
|
||
const bool active_region_specified = !active_region_data.ranges.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_active_regions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah I started with that but thought it was misleading because if false it implies that there are NO active regions when really its just that they are implicit and haven't been specified. You're probably right tho that has_active_regions
is more clear
row, relation_parameters); | ||
// TODO(https://github.com/AztecProtocol/barretenberg/issues/940):consider avoiding get_row if possible. | ||
auto row_idx = get_active_range_poly_idx(i); | ||
if constexpr (IsUltraFlavor<Flavor>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if constexpr (!IsPlonkFlavor<Flavor>)
would make this more readable I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not quite the same thing though because this code is also used by the ECCVM/Translator which need to be excluded. I think this just comes down to the fact that we need better concepts. Probably isUltraOrMegaHonk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I could just add methods to the ECCVM/Trans flavors that just call get_row from get_row_for_permutation_arg. Not sure what's better
for (size_t i = start; i < end; ++i) { | ||
grand_product_polynomial.at(i + 1) = numerator[i] * denominator[i]; | ||
auto poly_idx = get_active_range_poly_idx(i + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const size_t
for (size_t j = 0; j < active_region_data.ranges.size() - 1; ++j) { | ||
size_t previous_range_end = active_region_data.ranges[j].second; | ||
size_t next_range_start = active_region_data.ranges[j + 1].first; | ||
// If the index falls in an inactive region, set its value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems incomplete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha its not but I do see what you mean. I reordered to make it sound more natural
* master: (287 commits) feat: Sync from noir (#11051) chore(docs): Update tx concepts page (#10947) chore(docs): Edit Aztec.nr Guide section (#10866) chore: test:e2e defaults to no-docker (#10966) chore(avm): improve column stats (#11135) chore: Sanity checking of proving job IDs (#11134) feat: permutation argument optimizations (#10960) feat: single tx block root rollup (#11096) refactor: prover db config (#11126) feat: monitor event loop lag (#11127) chore: Greater stability at 1TPS (#10981) chore: Jest reporters for CI (#11125) fix: Sequencer times out L1 tx at end of L2 slot (#11112) feat: browser chunking (#11102) fix: Added start/stop guards to running promise and serial queue (#11120) fix: Don't retransmit txs upon node restart (#11123) fix: Prover node aborts execution at epoch end (#11111) feat: blob sink in sandbox without extra process (#11032) chore: log number of instructions executed for call in AVM. Misc fix. (#11110) git subrepo push --branch=master noir-projects/aztec-nr ...
A handful of optimizations for the large ambient trace setting, mostly to do with the grand product argument. Total savings is about 1s on the "17 in 20" benchmark.
active_row_idxs
which explicitly contains the indices of the active rows. This makes it easier to multithread and is much more efficient than looping over the entire domain and using something likecheck_is_active()
which itself has low overhead but results in huge disparities in the distribution of actual work across threads.std::vector
in PG with aPolynomial
simply to take advantage of the optimized constructorBranch "17 in 20" benchmark
ClientIVCBench/Full/6 20075 ms 17763 ms
Master "17 in 20" benchmark
ClientIVCBench/Full/6 21054 ms 18395 ms
The conventional benchmark ("19 in 19") shows a very minor improvement, as expected:
Branch:
ClientIVCBench/Full/6 22231 ms 19857 ms
Master:
ClientIVCBench/Full/6 22505 ms 19536 ms