Skip to content

Add spec and proof for polyvec_matrix_expand #232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

manastasova
Copy link
Contributor

Solves #137.

@manastasova manastasova requested a review from a team as a code owner May 13, 2025 23:29
@manastasova manastasova marked this pull request as draft May 13, 2025 23:32
@manastasova manastasova force-pushed the polyvec_matrix_expand branch from 4b193f4 to b729a2d Compare May 13, 2025 23:58
@manastasova manastasova force-pushed the polyvec_matrix_expand branch from b729a2d to 4642e80 Compare May 14, 2025 00:00
@manastasova manastasova marked this pull request as ready for review May 14, 2025 00:00
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 123219 cycles 122576 cycles 1.01
ML-DSA-44 sign 278554 cycles 277654 cycles 1.00
ML-DSA-44 verify 124131 cycles 123314 cycles 1.01
ML-DSA-65 keypair 222802 cycles 220403 cycles 1.01
ML-DSA-65 sign 477931 cycles 475379 cycles 1.01
ML-DSA-65 verify 209864 cycles 207524 cycles 1.01
ML-DSA-87 keypair 376802 cycles 373246 cycles 1.01
ML-DSA-87 sign 660875 cycles 660211 cycles 1.00
ML-DSA-87 verify 372335 cycles 368672 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 479505 cycles 464078 cycles 1.03
ML-DSA-44 sign 1173525 cycles 1156451 cycles 1.01
ML-DSA-44 verify 491013 cycles 476282 cycles 1.03
ML-DSA-65 keypair 838581 cycles 818208 cycles 1.02
ML-DSA-65 sign 1958744 cycles 1940565 cycles 1.01
ML-DSA-65 verify 803149 cycles 786332 cycles 1.02
ML-DSA-87 keypair 1414318 cycles 1378671 cycles 1.03
ML-DSA-87 sign 2672454 cycles 2617778 cycles 1.02
ML-DSA-87 verify 1390985 cycles 1360781 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 479505 cycles 464078 cycles 1.03
ML-DSA-44 verify 491013 cycles 476282 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 139847 cycles 139095 cycles 1.01
ML-DSA-44 sign 422235 cycles 421316 cycles 1.00
ML-DSA-44 verify 149170 cycles 148298 cycles 1.01
ML-DSA-65 keypair 245719 cycles 243880 cycles 1.01
ML-DSA-65 sign 698481 cycles 697316 cycles 1.00
ML-DSA-65 verify 244489 cycles 242812 cycles 1.01
ML-DSA-87 keypair 408311 cycles 403888 cycles 1.01
ML-DSA-87 sign 909697 cycles 906357 cycles 1.00
ML-DSA-87 verify 416564 cycles 412387 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 103302 cycles 104068 cycles 0.99
ML-DSA-44 sign 292727 cycles 291238 cycles 1.01
ML-DSA-44 verify 109078 cycles 108970 cycles 1.00
ML-DSA-65 keypair 185885 cycles 183677 cycles 1.01
ML-DSA-65 sign 467793 cycles 469547 cycles 1.00
ML-DSA-65 verify 175577 cycles 174213 cycles 1.01
ML-DSA-87 keypair 292589 cycles 293908 cycles 1.00
ML-DSA-87 sign 603901 cycles 606411 cycles 1.00
ML-DSA-87 verify 291969 cycles 291296 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 177556 cycles 175462 cycles 1.01
ML-DSA-44 sign 489520 cycles 489436 cycles 1.00
ML-DSA-44 verify 186297 cycles 183646 cycles 1.01
ML-DSA-65 keypair 300760 cycles 298780 cycles 1.01
ML-DSA-65 sign 777805 cycles 774354 cycles 1.00
ML-DSA-65 verify 299504 cycles 297524 cycles 1.01
ML-DSA-87 keypair 505472 cycles 501651 cycles 1.01
ML-DSA-87 sign 1026919 cycles 1021227 cycles 1.01
ML-DSA-87 verify 510673 cycles 506310 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 161130 cycles 159895 cycles 1.01
ML-DSA-44 sign 481085 cycles 480571 cycles 1.00
ML-DSA-44 verify 171055 cycles 169828 cycles 1.01
ML-DSA-65 keypair 274710 cycles 271950 cycles 1.01
ML-DSA-65 sign 774315 cycles 772756 cycles 1.00
ML-DSA-65 verify 276172 cycles 274866 cycles 1.00
ML-DSA-87 keypair 461045 cycles 457363 cycles 1.01
ML-DSA-87 sign 1014239 cycles 1010753 cycles 1.00
ML-DSA-87 verify 467004 cycles 463411 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 103448 cycles 103861 cycles 1.00
ML-DSA-44 sign 294746 cycles 292828 cycles 1.01
ML-DSA-44 verify 109190 cycles 108536 cycles 1.01
ML-DSA-65 keypair 186516 cycles 183621 cycles 1.02
ML-DSA-65 sign 473220 cycles 470258 cycles 1.01
ML-DSA-65 verify 175835 cycles 173809 cycles 1.01
ML-DSA-87 keypair 292137 cycles 294110 cycles 0.99
ML-DSA-87 sign 599914 cycles 603640 cycles 0.99
ML-DSA-87 verify 291974 cycles 290781 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 137704 cycles 136462 cycles 1.01
ML-DSA-44 sign 397592 cycles 398091 cycles 1.00
ML-DSA-44 verify 145003 cycles 144238 cycles 1.01
ML-DSA-65 keypair 235766 cycles 232903 cycles 1.01
ML-DSA-65 sign 618926 cycles 621434 cycles 1.00
ML-DSA-65 verify 232380 cycles 231882 cycles 1.00
ML-DSA-87 keypair 391837 cycles 385368 cycles 1.02
ML-DSA-87 sign 809945 cycles 807806 cycles 1.00
ML-DSA-87 verify 394136 cycles 389322 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 177552 cycles 174898 cycles 1.02
ML-DSA-44 sign 489532 cycles 486543 cycles 1.01
ML-DSA-44 verify 186313 cycles 183631 cycles 1.01
ML-DSA-65 keypair 300910 cycles 298892 cycles 1.01
ML-DSA-65 sign 777075 cycles 775754 cycles 1.00
ML-DSA-65 verify 299545 cycles 297441 cycles 1.01
ML-DSA-87 keypair 505623 cycles 501292 cycles 1.01
ML-DSA-87 sign 1027304 cycles 1022004 cycles 1.01
ML-DSA-87 verify 510824 cycles 506144 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 144416 cycles 142985 cycles 1.01
ML-DSA-44 sign 311589 cycles 309980 cycles 1.01
ML-DSA-44 verify 144608 cycles 142888 cycles 1.01
ML-DSA-65 keypair 256633 cycles 252438 cycles 1.02
ML-DSA-65 sign 515513 cycles 513050 cycles 1.00
ML-DSA-65 verify 243550 cycles 240177 cycles 1.01
ML-DSA-87 keypair 437257 cycles 430194 cycles 1.02
ML-DSA-87 sign 710283 cycles 704629 cycles 1.01
ML-DSA-87 verify 427702 cycles 418929 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 160733 cycles 159988 cycles 1.00
ML-DSA-44 sign 480923 cycles 480528 cycles 1.00
ML-DSA-44 verify 170854 cycles 169874 cycles 1.01
ML-DSA-65 keypair 273476 cycles 271927 cycles 1.01
ML-DSA-65 sign 774378 cycles 772026 cycles 1.00
ML-DSA-65 verify 276123 cycles 274184 cycles 1.01
ML-DSA-87 keypair 461091 cycles 457466 cycles 1.01
ML-DSA-87 sign 1014261 cycles 1008998 cycles 1.01
ML-DSA-87 verify 466988 cycles 463072 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 137918 cycles 135876 cycles 1.02
ML-DSA-44 sign 398505 cycles 396380 cycles 1.01
ML-DSA-44 verify 145222 cycles 143556 cycles 1.01
ML-DSA-65 keypair 236133 cycles 232982 cycles 1.01
ML-DSA-65 sign 617132 cycles 622646 cycles 0.99
ML-DSA-65 verify 232358 cycles 232224 cycles 1.00
ML-DSA-87 keypair 391852 cycles 385400 cycles 1.02
ML-DSA-87 sign 812425 cycles 808433 cycles 1.00
ML-DSA-87 verify 394900 cycles 389522 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 157171 cycles 155860 cycles 1.01
ML-DSA-44 sign 429450 cycles 428124 cycles 1.00
ML-DSA-44 verify 165299 cycles 163711 cycles 1.01
ML-DSA-65 keypair 276040 cycles 271650 cycles 1.02
ML-DSA-65 sign 712600 cycles 709595 cycles 1.00
ML-DSA-65 verify 273953 cycles 271014 cycles 1.01
ML-DSA-87 keypair 460629 cycles 454421 cycles 1.01
ML-DSA-87 sign 924454 cycles 918993 cycles 1.01
ML-DSA-87 verify 464867 cycles 456015 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 570965 cycles 555271 cycles 1.03
ML-DSA-44 sign 1945379 cycles 1928895 cycles 1.01
ML-DSA-44 verify 632121 cycles 618036 cycles 1.02
ML-DSA-65 keypair 965556 cycles 943705 cycles 1.02
ML-DSA-65 sign 3156311 cycles 3133685 cycles 1.01
ML-DSA-65 verify 1000989 cycles 982820 cycles 1.02
ML-DSA-87 keypair 1587964 cycles 1551893 cycles 1.02
ML-DSA-87 sign 4031248 cycles 3995110 cycles 1.01
ML-DSA-87 verify 1656852 cycles 1622473 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 240271 cycles 238259 cycles 1.01
ML-DSA-44 sign 544068 cycles 541559 cycles 1.00
ML-DSA-44 verify 240812 cycles 238982 cycles 1.01
ML-DSA-65 keypair 436633 cycles 432552 cycles 1.01
ML-DSA-65 sign 912087 cycles 908818 cycles 1.00
ML-DSA-65 verify 408709 cycles 404022 cycles 1.01
ML-DSA-87 keypair 724471 cycles 718760 cycles 1.01
ML-DSA-87 sign 1245715 cycles 1241888 cycles 1.00
ML-DSA-87 verify 710316 cycles 702505 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 153702 cycles 154274 cycles 1.00
ML-DSA-44 sign 332668 cycles 333787 cycles 1.00
ML-DSA-44 verify 153973 cycles 154477 cycles 1.00
ML-DSA-65 keypair 273295 cycles 275850 cycles 0.99
ML-DSA-65 sign 556712 cycles 557966 cycles 1.00
ML-DSA-65 verify 259693 cycles 260716 cycles 1.00
ML-DSA-87 keypair 463876 cycles 466703 cycles 0.99
ML-DSA-87 sign 770097 cycles 773992 cycles 0.99
ML-DSA-87 verify 452328 cycles 454741 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 259056 cycles 257465 cycles 1.01
ML-DSA-44 sign 705358 cycles 704879 cycles 1.00
ML-DSA-44 verify 271243 cycles 269324 cycles 1.01
ML-DSA-65 keypair 463268 cycles 459838 cycles 1.01
ML-DSA-65 sign 1161620 cycles 1160553 cycles 1.00
ML-DSA-65 verify 451145 cycles 448246 cycles 1.01
ML-DSA-87 keypair 760351 cycles 755312 cycles 1.01
ML-DSA-87 sign 1534745 cycles 1530320 cycles 1.00
ML-DSA-87 verify 770721 cycles 760760 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 165924 cycles 166662 cycles 1.00
ML-DSA-44 sign 439047 cycles 440512 cycles 1.00
ML-DSA-44 verify 172785 cycles 173614 cycles 1.00
ML-DSA-65 keypair 291486 cycles 293210 cycles 0.99
ML-DSA-65 sign 719368 cycles 720651 cycles 1.00
ML-DSA-65 verify 286537 cycles 287411 cycles 1.00
ML-DSA-87 keypair 488307 cycles 491567 cycles 0.99
ML-DSA-87 sign 957756 cycles 961622 cycles 1.00
ML-DSA-87 verify 489888 cycles 492214 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 240147 cycles 238149 cycles 1.01
ML-DSA-44 sign 542538 cycles 541039 cycles 1.00
ML-DSA-44 verify 240082 cycles 238413 cycles 1.01
ML-DSA-65 keypair 435752 cycles 431498 cycles 1.01
ML-DSA-65 sign 910954 cycles 906895 cycles 1.00
ML-DSA-65 verify 406801 cycles 402775 cycles 1.01
ML-DSA-87 keypair 721774 cycles 717656 cycles 1.01
ML-DSA-87 sign 1242382 cycles 1240366 cycles 1.00
ML-DSA-87 verify 706815 cycles 700727 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 259199 cycles 257046 cycles 1.01
ML-DSA-44 sign 704949 cycles 705718 cycles 1.00
ML-DSA-44 verify 270923 cycles 268710 cycles 1.01
ML-DSA-65 keypair 462699 cycles 458714 cycles 1.01
ML-DSA-65 sign 1159301 cycles 1159791 cycles 1.00
ML-DSA-65 verify 449608 cycles 447250 cycles 1.01
ML-DSA-87 keypair 759847 cycles 754238 cycles 1.01
ML-DSA-87 sign 1531906 cycles 1526092 cycles 1.00
ML-DSA-87 verify 765426 cycles 758532 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 1104920 cycles 1099647 cycles 1.00
ML-DSA-44 sign 4007589 cycles 4001780 cycles 1.00
ML-DSA-44 verify 1230769 cycles 1225498 cycles 1.00
ML-DSA-65 keypair 1885025 cycles 1864250 cycles 1.01
ML-DSA-65 sign 6567120 cycles 6547708 cycles 1.00
ML-DSA-65 verify 1997422 cycles 1981330 cycles 1.01
ML-DSA-87 keypair 3089332 cycles 3084924 cycles 1.00
ML-DSA-87 sign 8295338 cycles 8269471 cycles 1.00
ML-DSA-87 verify 3276216 cycles 3260711 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 320619 cycles 316629 cycles 1.01
ML-DSA-44 sign 846687 cycles 843864 cycles 1.00
ML-DSA-44 verify 319730 cycles 316599 cycles 1.01
ML-DSA-65 keypair 603641 cycles 595356 cycles 1.01
ML-DSA-65 sign 1255200 cycles 1247634 cycles 1.01
ML-DSA-65 verify 545136 cycles 535921 cycles 1.02
ML-DSA-87 keypair 958297 cycles 941844 cycles 1.02
ML-DSA-87 sign 1719544 cycles 1708271 cycles 1.01
ML-DSA-87 verify 945666 cycles 927618 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-65 sign 1285562 cycles 1247634 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-44 keypair 353820 cycles 349834 cycles 1.01
ML-DSA-44 sign 1043370 cycles 1043530 cycles 1.00
ML-DSA-44 verify 373122 cycles 368216 cycles 1.01
ML-DSA-65 keypair 650839 cycles 640876 cycles 1.02
ML-DSA-65 sign 1695863 cycles 1683356 cycles 1.01
ML-DSA-65 verify 617702 cycles 608139 cycles 1.02
ML-DSA-87 keypair 1022816 cycles 1007488 cycles 1.02
ML-DSA-87 sign 2221994 cycles 2206799 cycles 1.01
ML-DSA-87 verify 1041803 cycles 1024829 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 4642e80 Previous: 858772b Ratio
ML-DSA-65 sign 1744505 cycles 1683356 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @manastasova.

The performance penalty for the copying is noticable (3% overall on some platforms).
Is this a temporary workaround for diffblue/cbmc#8617, or is this unrelated? If it is a temporary workaround, we can accept it now and revisit it later. Otherwise, I think we need to think about alternatives.

Another problem is that matrix_expand should really be using a 4-way batched poly_uniform (which then uses a 4-way batched Keccak). See #210. So this function will have to be rewritten anyway. Do you maybe want to do that already?


for (i = 0; i < MLDSA_K; ++i)
__loop__(
assigns(i, memory_slice(mat, MLDSA_K * sizeof(polyvecl)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly then this should be a whole_object(mat)

for (j = 0; j < MLDSA_L; ++j)
__loop__(
assigns(j, memory_slice(&tmp_polyvecl, sizeof(polyvecl)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here? Not sure.

{
unsigned int j;
polyvecl tmp_polyvecl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a temporary workaround for diffblue/cbmc#8617? If so, please add an appropriate comment with a TODO.

{
poly_uniform(&mat[i].vec[j], rho, (i << 8) + j);
poly temp_poly;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@hanno-becker
Copy link
Contributor

Thanks a lot @manastasova for tackling this!

Another problem is that matrix_expand should really be using a 4-way batched poly_uniform (which then uses a 4-way batched Keccak). See #210. So this function will have to be rewritten anyway. Do you maybe want to do that already?

I think @mkannwischer raises an important point here. This is one of the main areas of code that we know will need to be rewritten, so we may want to do this first. What do you think, @manastasova?

@mkannwischer
Copy link
Contributor

Thanks a lot @manastasova for tackling this!

Another problem is that matrix_expand should really be using a 4-way batched poly_uniform (which then uses a 4-way batched Keccak). See #210. So this function will have to be rewritten anyway. Do you maybe want to do that already?

I think @mkannwischer raises an important point here. This is one of the main areas of code that we know will need to be rewritten, so we may want to do this first. What do you think, @manastasova?

I will give poly_uniform_4x (#210) a try - it seems we need to urgently get that into place.

@manastasova
Copy link
Contributor Author

Thank you @mkannwischer and @hanno-becker for the comments and suggestions!
I see the PR #233 already solved the x4 batched matrix generation. I am closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants