Slow comitment to polynomial with blst_from_scratch #202

nazar-pc · 2023-03-20T22:03:54Z

I have migrated code from using https://github.com/dusk-network/plonk to blst_from_scratch and noticed improved performance except for committing to polynomial.

Before:

create-polynomial       time:   [78.910 µs 80.642 µs 82.387 µs]
commit                  time:   [375.98 µs 382.69 µs 389.47 µs]
prove                   time:   [406.50 µs 412.22 µs 417.96 µs]
verify                  time:   [2.6014 ms 2.7458 ms 2.8826 ms]

After:

create-polynomial       time:   [1.7445 µs 1.7448 µs 1.7451 µs]
commit                  time:   [779.59 µs 779.96 µs 780.34 µs]
prove                   time:   [386.50 µs 387.20 µs 387.99 µs]
verify                  time:   [1.7031 ms 1.7032 ms 1.7034 ms]

It is ~2x slower with blst_from_scratch, though I am not yet sure where slowness comes from.

I'm committing to polynomial with 8 coefficients, while FsKZGSettings was initialized with scale of 10.

The text was updated successfully, but these errors were encountered:

belijzajac · 2023-03-21T11:44:33Z

The function of interest is g1_linear_combination where Pippenger's algorithm is used for larger polynomials and a naive multiplication for polynomials with a degree less than 8.

https://github.com/sifraitech/rust-kzg/blob/b09120d67150af474d7407092691f866a874bac1/blst-from-scratch/src/types/kzg_settings.rs#L56-L65

We discovered some time ago that when committing to polynomials of degree 2^15, the parallelized Variable Base Multiscalar Multiplication approach is much faster than the Pippenger's algorithm, and due to a deadline, we only let arkworks and zkcrypto implement it.

What we can try to do is port this algorithm that was originally written by ZEXE (now Arkworks?) authors from zkcrypto to blst-from-scratch and mcl (we'll need it for EIP-4844 anyway):

https://github.com/sifraitech/rust-kzg/blob/b09120d67150af474d7407092691f866a874bac1/zkcrypto/src/curve/multiscalar_mul.rs#L183-L268

Though, I doubt we'll see any improvement for polynomials with degrees up to 8.

sauliusgrigaitis · 2023-03-21T13:02:23Z

We should try to port MSM to blst_from_scratch. I think we discussed with @belijzajac this before as Go KZG implementation that uses Gnark supporting MSM performs much faster. We should check what other optimizations they use and benchmark against them.

sauliusgrigaitis · 2023-03-21T17:27:50Z

I'm committing to polynomial with 8 coefficients, while FsKZGSettings was initialized with scale of 10.

@nazar-pc just to be sure - do your polynomials have only 2^3 coefficients? Those optimizations usually start to show great results on a much higher scale. At the scale 2^3 speed is already µs (except verification which is slightly slower) so you likely will not notice the difference unless you have a special case.

nazar-pc · 2023-03-21T18:10:19Z

The benchmark was using 2^3 coefficients, but we have various use cases, some are 2^15 even.

sauliusgrigaitis · 2023-03-31T10:35:01Z

Worth to check this paper. They claim 78% speed over arkworks. Source code is available (seems gnark has this implementation).

nazar-pc · 2023-03-31T10:36:19Z

@dariolina, you can take a look at ^ when you have time

dariolina · 2023-03-31T10:48:46Z

Thank you! I have also found this paper that claims 3-20% speedup over blst with precomputation. Maybe I could combine the two.

belijzajac · 2023-04-05T20:49:16Z

I found out that there already exists a Pippenger's implementation for supranational/blst rust binding, which appears to use buckets (a large grid for storing the results) and a thread pool for parallelization.

Implemented it via d365801 which achieved the following results:

Benchmark	Old	New	Speedup
blob_to_kzg_commitment	54.145 ms	17.132 ms	3.16x
compute_kzg_proof	65.029 ms	26.509 ms	2.45x
compute_blob_kzg_proof	65.680 ms	27.342 ms	2.40x
bench_commit_to_poly	348.61 ms	97.402 ms	3.58x

However, our test_fk_multi_chunk_len_1_512 test fails (only 1 out of 80), and the java binding also fails 1 out of 222 test vectors, and I'm not entirely certain why:

nazar-pc · 2023-04-05T20:59:10Z

For the use case I have in mind single-core performance would be more important.
We are already parallelizing things across multiple cores, so we need less CPU cycles per call (if possible), not necessary less time using any number of cores for each individual call.

sauliusgrigaitis · 2023-04-06T07:50:25Z

@belijzajac > great work! This seems related to the small samples. Both tests that crashed for us are because of this. Could you try to reproduce such issue in blst tests? So then we could submit to blst folks.

@nazar-pc there is interesting situation with parallel vs non-parallel. We probably should stick to this concept:

Non-parallel - means strictly single thread, no parallelism neither at cryptographic algorithms, neither in blobs batching.

Parallel - means max parallelism, so parallelism would be used both at cryptographic algorithms and in order to parallelize blobs batching. This is a simple to understand approach, but because of parallel overhead there could be not the best results for some setups.

belijzajac mentioned this issue Apr 6, 2023

Parallel pippenger from blst #211

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow comitment to polynomial with blst_from_scratch #202

Slow comitment to polynomial with blst_from_scratch #202

nazar-pc commented Mar 20, 2023

belijzajac commented Mar 21, 2023

sauliusgrigaitis commented Mar 21, 2023

sauliusgrigaitis commented Mar 21, 2023

nazar-pc commented Mar 21, 2023

sauliusgrigaitis commented Mar 31, 2023

nazar-pc commented Mar 31, 2023

dariolina commented Mar 31, 2023

belijzajac commented Apr 5, 2023

nazar-pc commented Apr 5, 2023

sauliusgrigaitis commented Apr 6, 2023

Slow comitment to polynomial with blst_from_scratch #202

Slow comitment to polynomial with blst_from_scratch #202

Comments

nazar-pc commented Mar 20, 2023

belijzajac commented Mar 21, 2023

sauliusgrigaitis commented Mar 21, 2023

sauliusgrigaitis commented Mar 21, 2023

nazar-pc commented Mar 21, 2023

sauliusgrigaitis commented Mar 31, 2023

nazar-pc commented Mar 31, 2023

dariolina commented Mar 31, 2023

belijzajac commented Apr 5, 2023

nazar-pc commented Apr 5, 2023

sauliusgrigaitis commented Apr 6, 2023