Skip to content

Conversation

jakemas
Copy link
Contributor

@jakemas jakemas commented Sep 19, 2025

working towards: #303

Testing CI - CBMC is not happy with me today.

I have seen #346 and #334 -- I just want to forcast out some proofs.

The optimization seems beneficial, here are the numbers:
Generic C Implementation (no native AVX2):

ml-dsa-44:
polyvecl_pointwise_acc_montgomery cycles=1109
polyvec_matrix_pointwise_montgomery cycles=4454
ml-dsa-65:
polyvecl_pointwise_acc_montgomery cycles=1242
polyvec_matrix_pointwise_montgomery cycles=7612
ml-dsa-87:
polyvecl_pointwise_acc_montgomery cycles=1642
polyvec_matrix_pointwise_montgomery cycles=13231

ml-dsa-44:
   keypair cycles (avg) = 100784
      sign cycles (avg) = 388545
    verify cycles (avg) = 120583
ml-dsa-65:
   keypair cycles (avg) = 175180
      sign cycles (avg) = 643694
    verify cycles (avg) = 192871
ml-dsa-87:
   keypair cycles (avg) = 274045
      sign cycles (avg) = 770226
    verify cycles (avg) = 295912

Native AVX2 Implementation:

ml-dsa-44:
polyvecl_pointwise_acc_montgomery cycles=547
polyvec_matrix_pointwise_montgomery cycles=1517
ml-dsa-65:
polyvecl_pointwise_acc_montgomery cycles=606
polyvec_matrix_pointwise_montgomery cycles=2782
ml-dsa-87:
polyvecl_pointwise_acc_montgomery cycles=714
polyvec_matrix_pointwise_montgomery cycles=4781

ml-dsa-44:
   keypair cycles (avg) = 96880
      sign cycles (avg) = 354220
    verify cycles (avg) = 113390
ml-dsa-65:
   keypair cycles (avg) = 168277
      sign cycles (avg) = 581154
    verify cycles (avg) = 182787
ml-dsa-87:
   keypair cycles (avg) = 262612
      sign cycles (avg) = 694362
    verify cycles (avg) = 280321

Overall:

ML-DSA Performance Analysis: Native AVX2 vs Reference Implementation
==========================================================================================

Variant      Operation  Reference    AVX2         Speedup    Improvement 
------------------------------------------------------------------------------------------
ml-dsa-44    keypair    100,784      96,880       1.04x       3.9%
             sign       388,545      354,220      1.10x       8.8%
             verify     120,583      113,390      1.06x       6.0%

ml-dsa-65    keypair    175,180      168,277      1.04x       3.9%
             sign       643,694      581,154      1.11x       9.7%
             verify     192,871      182,787      1.06x       5.2%

ml-dsa-87    keypair    274,045      262,612      1.04x       4.2%
             sign       770,226      694,362      1.11x       9.8%
             verify     295,912      280,321      1.06x       5.3%
------------------------------------------------------------------------------------------
Polyvecl Pointwise Acc Montgomery: Average 2.13x speedup
Polyvec Matrix Pointwise Montgomery: Average 2.81x speedup

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 46248 cycles 47882 cycles 0.97
ML-DSA-44 sign 132731 cycles 151579 cycles 0.88
ML-DSA-44 verify 47921 cycles 50930 cycles 0.94
ML-DSA-65 keypair 81263 cycles 83679 cycles 0.97
ML-DSA-65 sign 219965 cycles 249921 cycles 0.88
ML-DSA-65 verify 80290 cycles 84582 cycles 0.95
ML-DSA-87 keypair 132627 cycles 135655 cycles 0.98
ML-DSA-87 sign 282718 cycles 314509 cycles 0.90
ML-DSA-87 verify 130754 cycles 136425 cycles 0.96

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 115028 cycles 115049 cycles 1.00
ML-DSA-44 sign 431569 cycles 431559 cycles 1.00
ML-DSA-44 verify 122131 cycles 122151 cycles 1.00
ML-DSA-65 keypair 197067 cycles 197046 cycles 1.00
ML-DSA-65 sign 701230 cycles 701106 cycles 1.00
ML-DSA-65 verify 197660 cycles 197624 cycles 1.00
ML-DSA-87 keypair 325292 cycles 325321 cycles 1.00
ML-DSA-87 sign 884606 cycles 884597 cycles 1.00
ML-DSA-87 verify 328970 cycles 328981 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 115357 cycles 115355 cycles 1.00
ML-DSA-44 sign 379153 cycles 380513 cycles 1.00
ML-DSA-44 verify 120797 cycles 120663 cycles 1.00
ML-DSA-65 keypair 199788 cycles 199958 cycles 1.00
ML-DSA-65 sign 627802 cycles 630974 cycles 0.99
ML-DSA-65 verify 199061 cycles 199181 cycles 1.00
ML-DSA-87 keypair 327391 cycles 327118 cycles 1.00
ML-DSA-87 sign 796685 cycles 797677 cycles 1.00
ML-DSA-87 verify 326481 cycles 326175 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 213394 cycles 213177 cycles 1.00
ML-DSA-44 sign 780740 cycles 781343 cycles 1.00
ML-DSA-44 verify 230090 cycles 230299 cycles 1.00
ML-DSA-65 keypair 381212 cycles 380958 cycles 1.00
ML-DSA-65 sign 1304148 cycles 1291486 cycles 1.01
ML-DSA-65 verify 372827 cycles 372866 cycles 1.00
ML-DSA-87 keypair 609690 cycles 608974 cycles 1.00
ML-DSA-87 sign 1641652 cycles 1641956 cycles 1.00
ML-DSA-87 verify 621887 cycles 621291 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 34941 cycles 35216 cycles 0.99
ML-DSA-44 sign 121031 cycles 125138 cycles 0.97
ML-DSA-44 verify 38303 cycles 39121 cycles 0.98
ML-DSA-65 keypair 62865 cycles 63454 cycles 0.99
ML-DSA-65 sign 202124 cycles 208653 cycles 0.97
ML-DSA-65 verify 62825 cycles 64036 cycles 0.98
ML-DSA-87 keypair 94988 cycles 96943 cycles 0.98
ML-DSA-87 sign 234763 cycles 251543 cycles 0.93
ML-DSA-87 verify 93672 cycles 96969 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 96120 cycles 95709 cycles 1.00
ML-DSA-44 sign 346090 cycles 345734 cycles 1.00
ML-DSA-44 verify 101356 cycles 101338 cycles 1.00
ML-DSA-65 keypair 164900 cycles 164680 cycles 1.00
ML-DSA-65 sign 568251 cycles 568266 cycles 1.00
ML-DSA-65 verify 165339 cycles 165518 cycles 1.00
ML-DSA-87 keypair 270791 cycles 270264 cycles 1.00
ML-DSA-87 sign 725314 cycles 725295 cycles 1.00
ML-DSA-87 verify 273144 cycles 273484 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 57088 cycles 57825 cycles 0.99
ML-DSA-44 sign 180631 cycles 190230 cycles 0.95
ML-DSA-44 verify 61122 cycles 63174 cycles 0.97
ML-DSA-65 keypair 99951 cycles 102038 cycles 0.98
ML-DSA-65 sign 296215 cycles 317731 cycles 0.93
ML-DSA-65 verify 100617 cycles 104176 cycles 0.97
ML-DSA-87 keypair 153586 cycles 157788 cycles 0.97
ML-DSA-87 sign 353630 cycles 378266 cycles 0.93
ML-DSA-87 verify 153792 cycles 158668 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 288220 cycles 288573 cycles 1.00
ML-DSA-44 sign 926141 cycles 930921 cycles 0.99
ML-DSA-44 verify 294580 cycles 295012 cycles 1.00
ML-DSA-65 keypair 493233 cycles 490626 cycles 1.01
ML-DSA-65 sign 1588611 cycles 1551572 cycles 1.02
ML-DSA-65 verify 483195 cycles 481319 cycles 1.00
ML-DSA-87 keypair 838376 cycles 833327 cycles 1.01
ML-DSA-87 sign 2073639 cycles 2076843 cycles 1.00
ML-DSA-87 verify 828140 cycles 817076 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 69244 cycles 72965 cycles 0.95
ML-DSA-44 sign 185852 cycles 209026 cycles 0.89
ML-DSA-44 verify 69233 cycles 74357 cycles 0.93
ML-DSA-65 keypair 119293 cycles 122701 cycles 0.97
ML-DSA-65 sign 296002 cycles 330324 cycles 0.90
ML-DSA-65 verify 115422 cycles 121320 cycles 0.95
ML-DSA-87 keypair 201621 cycles 208588 cycles 0.97
ML-DSA-87 sign 386910 cycles 432454 cycles 0.89
ML-DSA-87 verify 193867 cycles 203545 cycles 0.95

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 69520 cycles 69452 cycles 1.00
ML-DSA-44 sign 214186 cycles 214770 cycles 1.00
ML-DSA-44 verify 72613 cycles 72508 cycles 1.00
ML-DSA-65 keypair 123271 cycles 122780 cycles 1.00
ML-DSA-65 sign 352561 cycles 353195 cycles 1.00
ML-DSA-65 verify 120572 cycles 120430 cycles 1.00
ML-DSA-87 keypair 201687 cycles 200488 cycles 1.01
ML-DSA-87 sign 451836 cycles 451124 cycles 1.00
ML-DSA-87 verify 198535 cycles 198358 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 157730 cycles 157874 cycles 1.00
ML-DSA-44 sign 562520 cycles 563418 cycles 1.00
ML-DSA-44 verify 169489 cycles 169267 cycles 1.00
ML-DSA-65 keypair 269712 cycles 269343 cycles 1.00
ML-DSA-65 sign 929000 cycles 928710 cycles 1.00
ML-DSA-65 verify 274518 cycles 274926 cycles 1.00
ML-DSA-87 keypair 450802 cycles 450143 cycles 1.00
ML-DSA-87 sign 1177764 cycles 1177838 cycles 1.00
ML-DSA-87 verify 458464 cycles 458629 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 136237 cycles 135835 cycles 1.00
ML-DSA-44 sign 543996 cycles 543662 cycles 1.00
ML-DSA-44 verify 148769 cycles 148551 cycles 1.00
ML-DSA-65 keypair 227121 cycles 227293 cycles 1.00
ML-DSA-65 sign 880222 cycles 880495 cycles 1.00
ML-DSA-65 verify 236396 cycles 235973 cycles 1.00
ML-DSA-87 keypair 374620 cycles 376279 cycles 1.00
ML-DSA-87 sign 1098382 cycles 1099997 cycles 1.00
ML-DSA-87 verify 387197 cycles 388895 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 42030 cycles 42495 cycles 0.99
ML-DSA-44 sign 130656 cycles 136953 cycles 0.95
ML-DSA-44 verify 44057 cycles 45651 cycles 0.97
ML-DSA-65 keypair 72606 cycles 73514 cycles 0.99
ML-DSA-65 sign 211799 cycles 222961 cycles 0.95
ML-DSA-65 verify 72874 cycles 75361 cycles 0.97
ML-DSA-87 keypair 108885 cycles 111831 cycles 0.97
ML-DSA-87 sign 248272 cycles 264029 cycles 0.94
ML-DSA-87 verify 109458 cycles 113897 cycles 0.96

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 0241a1e Previous: a50c7c5 Ratio
ML-DSA-65 keypair 74894 cycles 72607 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 132657 cycles 132765 cycles 1.00
ML-DSA-44 sign 498636 cycles 498360 cycles 1.00
ML-DSA-44 verify 144922 cycles 144978 cycles 1.00
ML-DSA-65 keypair 227247 cycles 227374 cycles 1.00
ML-DSA-65 sign 813948 cycles 813162 cycles 1.00
ML-DSA-65 verify 232046 cycles 231727 cycles 1.00
ML-DSA-87 keypair 374396 cycles 374649 cycles 1.00
ML-DSA-87 sign 1021479 cycles 1021467 cycles 1.00
ML-DSA-87 verify 383843 cycles 383727 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 115424 cycles 116321 cycles 0.99
ML-DSA-44 sign 379369 cycles 382918 cycles 0.99
ML-DSA-44 verify 120863 cycles 121647 cycles 0.99
ML-DSA-65 keypair 199840 cycles 200280 cycles 1.00
ML-DSA-65 sign 628198 cycles 631576 cycles 0.99
ML-DSA-65 verify 199027 cycles 199420 cycles 1.00
ML-DSA-87 keypair 327839 cycles 327953 cycles 1.00
ML-DSA-87 sign 798048 cycles 799114 cycles 1.00
ML-DSA-87 verify 326496 cycles 326610 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 73719 cycles 73940 cycles 1.00
ML-DSA-44 sign 226972 cycles 228396 cycles 0.99
ML-DSA-44 verify 77935 cycles 78071 cycles 1.00
ML-DSA-65 keypair 129724 cycles 129923 cycles 1.00
ML-DSA-65 sign 375574 cycles 377186 cycles 1.00
ML-DSA-65 verify 128828 cycles 129040 cycles 1.00
ML-DSA-87 keypair 210463 cycles 210651 cycles 1.00
ML-DSA-87 sign 476614 cycles 478561 cycles 1.00
ML-DSA-87 verify 209714 cycles 210198 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 122811 cycles 120120 cycles 1.02
ML-DSA-44 sign 462112 cycles 453371 cycles 1.02
ML-DSA-44 verify 133080 cycles 132326 cycles 1.01
ML-DSA-65 keypair 204594 cycles 204716 cycles 1.00
ML-DSA-65 sign 737503 cycles 737570 cycles 1.00
ML-DSA-65 verify 209127 cycles 210009 cycles 1.00
ML-DSA-87 keypair 339240 cycles 338444 cycles 1.00
ML-DSA-87 sign 943406 cycles 941628 cycles 1.00
ML-DSA-87 verify 347932 cycles 349377 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 138571 cycles 138584 cycles 1.00
ML-DSA-44 sign 494766 cycles 495481 cycles 1.00
ML-DSA-44 verify 148733 cycles 148760 cycles 1.00
ML-DSA-65 keypair 241546 cycles 241312 cycles 1.00
ML-DSA-65 sign 809788 cycles 809760 cycles 1.00
ML-DSA-65 verify 241008 cycles 240909 cycles 1.00
ML-DSA-87 keypair 396585 cycles 396477 cycles 1.00
ML-DSA-87 sign 1032199 cycles 1031613 cycles 1.00
ML-DSA-87 verify 402527 cycles 402260 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 213600 cycles 213647 cycles 1.00
ML-DSA-44 sign 781050 cycles 794200 cycles 0.98
ML-DSA-44 verify 230186 cycles 230157 cycles 1.00
ML-DSA-65 keypair 381315 cycles 381964 cycles 1.00
ML-DSA-65 sign 1287118 cycles 1286398 cycles 1.00
ML-DSA-65 verify 373077 cycles 373972 cycles 1.00
ML-DSA-87 keypair 609715 cycles 609842 cycles 1.00
ML-DSA-87 sign 1643926 cycles 1645519 cycles 1.00
ML-DSA-87 verify 621660 cycles 621691 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 822653 cycles 822607 cycles 1.00
ML-DSA-44 sign 3328293 cycles 3327961 cycles 1.00
ML-DSA-44 verify 920551 cycles 919517 cycles 1.00
ML-DSA-65 keypair 1396003 cycles 1395382 cycles 1.00
ML-DSA-65 sign 5444867 cycles 5429294 cycles 1.00
ML-DSA-65 verify 1463286 cycles 1462157 cycles 1.00
ML-DSA-87 keypair 2303082 cycles 2304525 cycles 1.00
ML-DSA-87 sign 6826409 cycles 6826659 cycles 1.00
ML-DSA-87 verify 2398826 cycles 2401170 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 465726 cycles 465321 cycles 1.00
ML-DSA-44 sign 2226268 cycles 2229058 cycles 1.00
ML-DSA-44 verify 546713 cycles 547134 cycles 1.00
ML-DSA-65 keypair 778115 cycles 779224 cycles 1.00
ML-DSA-65 sign 3635158 cycles 3638239 cycles 1.00
ML-DSA-65 verify 849641 cycles 850223 cycles 1.00
ML-DSA-87 keypair 1261562 cycles 1267615 cycles 1.00
ML-DSA-87 sign 4538959 cycles 4512261 cycles 1.01
ML-DSA-87 verify 1370897 cycles 1373672 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 223435 cycles 241131 cycles 0.93
ML-DSA-44 sign 660117 cycles 692486 cycles 0.95
ML-DSA-44 verify 226160 cycles 231043 cycles 0.98
ML-DSA-65 keypair 389374 cycles 393697 cycles 0.99
ML-DSA-65 sign 1102631 cycles 1084494 cycles 1.02
ML-DSA-65 verify 380262 cycles 375628 cycles 1.01
ML-DSA-87 keypair 669282 cycles 671942 cycles 1.00
ML-DSA-87 sign 1514678 cycles 1488161 cycles 1.02
ML-DSA-87 verify 651259 cycles 646858 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 0241a1e Previous: a50c7c5 Ratio
ML-DSA-44 keypair 243405 cycles 234807 cycles 1.04
ML-DSA-44 verify 260136 cycles 246581 cycles 1.05
ML-DSA-65 keypair 422006 cycles 396165 cycles 1.07
ML-DSA-65 verify 412093 cycles 399635 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 311676 cycles 301890 cycles 1.03
ML-DSA-44 sign 1194281 cycles 1200079 cycles 1.00
ML-DSA-44 verify 329835 cycles 332889 cycles 0.99
ML-DSA-65 keypair 574421 cycles 568930 cycles 1.01
ML-DSA-65 sign 2008654 cycles 1985832 cycles 1.01
ML-DSA-65 verify 551223 cycles 539286 cycles 1.02
ML-DSA-87 keypair 845329 cycles 863211 cycles 0.98
ML-DSA-87 sign 2432925 cycles 2481765 cycles 0.98
ML-DSA-87 verify 859958 cycles 888876 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: a2e40c1 Previous: abf8281 Ratio
ML-DSA-44 keypair 311676 cycles 301890 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

@jakemas jakemas marked this pull request as ready for review September 22, 2025 17:09
@jakemas jakemas requested a review from a team as a code owner September 22, 2025 17:09
@mkannwischer
Copy link
Contributor

Thanks @jakemas! I hope it is okay if @jammychiou1 continues this PR and adds AArch64 in additon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split this in two files. that will make verification easier later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll neaten up the x86 parts and then @jammychiou1 can take over and add the AArch64!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jammychiou1 is already working on this since yesterday.

@jammychiou1
Copy link
Contributor

jammychiou1 commented Oct 3, 2025

Thank you @jakemas for letting me take over! Hope that I didn't introduce bugs to your commits while rebasing.

I'll add the AArch64 parts later.

@mkannwischer
Copy link
Contributor

Marking this as draft. Please mark it as ready for review when it's ready.

@mkannwischer mkannwischer marked this pull request as draft October 3, 2025 08:10
jakemas and others added 9 commits October 13, 2025 00:45
Signed-off-by: Jake Massimo <[email protected]>
Signed-off-by: Jake Massimo <[email protected]>
Run the updated autogen and format scripts again.

Signed-off-by: jammychiou1 <[email protected]>
jammychiou1 and others added 2 commits October 13, 2025 10:04
This commit adds a native implementation of poly_pointwise_montgomery
written from scratch.

Co-authored-by: Matthias J. Kannwischer <[email protected]>
Signed-off-by: jammychiou1 <[email protected]>
This commit adds native implementations of
polyvecl_pointwise_acc_montgomery written from scratch.

Co-authored-by: Matthias J. Kannwischer <[email protected]>
Signed-off-by: jammychiou1 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AVX2: Add poly_pointwise_montgomery assembly

4 participants