-
Notifications
You must be signed in to change notification settings - Fork 23
AVX2: Add pointwise and pointwise_acc native backend #468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
46248 cycles |
47882 cycles |
0.97 |
ML-DSA-44 sign |
132731 cycles |
151579 cycles |
0.88 |
ML-DSA-44 verify |
47921 cycles |
50930 cycles |
0.94 |
ML-DSA-65 keypair |
81263 cycles |
83679 cycles |
0.97 |
ML-DSA-65 sign |
219965 cycles |
249921 cycles |
0.88 |
ML-DSA-65 verify |
80290 cycles |
84582 cycles |
0.95 |
ML-DSA-87 keypair |
132627 cycles |
135655 cycles |
0.98 |
ML-DSA-87 sign |
282718 cycles |
314509 cycles |
0.90 |
ML-DSA-87 verify |
130754 cycles |
136425 cycles |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (no-opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
115028 cycles |
115049 cycles |
1.00 |
ML-DSA-44 sign |
431569 cycles |
431559 cycles |
1.00 |
ML-DSA-44 verify |
122131 cycles |
122151 cycles |
1.00 |
ML-DSA-65 keypair |
197067 cycles |
197046 cycles |
1.00 |
ML-DSA-65 sign |
701230 cycles |
701106 cycles |
1.00 |
ML-DSA-65 verify |
197660 cycles |
197624 cycles |
1.00 |
ML-DSA-87 keypair |
325292 cycles |
325321 cycles |
1.00 |
ML-DSA-87 sign |
884606 cycles |
884597 cycles |
1.00 |
ML-DSA-87 verify |
328970 cycles |
328981 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
115357 cycles |
115355 cycles |
1.00 |
ML-DSA-44 sign |
379153 cycles |
380513 cycles |
1.00 |
ML-DSA-44 verify |
120797 cycles |
120663 cycles |
1.00 |
ML-DSA-65 keypair |
199788 cycles |
199958 cycles |
1.00 |
ML-DSA-65 sign |
627802 cycles |
630974 cycles |
0.99 |
ML-DSA-65 verify |
199061 cycles |
199181 cycles |
1.00 |
ML-DSA-87 keypair |
327391 cycles |
327118 cycles |
1.00 |
ML-DSA-87 sign |
796685 cycles |
797677 cycles |
1.00 |
ML-DSA-87 verify |
326481 cycles |
326175 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
213394 cycles |
213177 cycles |
1.00 |
ML-DSA-44 sign |
780740 cycles |
781343 cycles |
1.00 |
ML-DSA-44 verify |
230090 cycles |
230299 cycles |
1.00 |
ML-DSA-65 keypair |
381212 cycles |
380958 cycles |
1.00 |
ML-DSA-65 sign |
1304148 cycles |
1291486 cycles |
1.01 |
ML-DSA-65 verify |
372827 cycles |
372866 cycles |
1.00 |
ML-DSA-87 keypair |
609690 cycles |
608974 cycles |
1.00 |
ML-DSA-87 sign |
1641652 cycles |
1641956 cycles |
1.00 |
ML-DSA-87 verify |
621887 cycles |
621291 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
34941 cycles |
35216 cycles |
0.99 |
ML-DSA-44 sign |
121031 cycles |
125138 cycles |
0.97 |
ML-DSA-44 verify |
38303 cycles |
39121 cycles |
0.98 |
ML-DSA-65 keypair |
62865 cycles |
63454 cycles |
0.99 |
ML-DSA-65 sign |
202124 cycles |
208653 cycles |
0.97 |
ML-DSA-65 verify |
62825 cycles |
64036 cycles |
0.98 |
ML-DSA-87 keypair |
94988 cycles |
96943 cycles |
0.98 |
ML-DSA-87 sign |
234763 cycles |
251543 cycles |
0.93 |
ML-DSA-87 verify |
93672 cycles |
96969 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
96120 cycles |
95709 cycles |
1.00 |
ML-DSA-44 sign |
346090 cycles |
345734 cycles |
1.00 |
ML-DSA-44 verify |
101356 cycles |
101338 cycles |
1.00 |
ML-DSA-65 keypair |
164900 cycles |
164680 cycles |
1.00 |
ML-DSA-65 sign |
568251 cycles |
568266 cycles |
1.00 |
ML-DSA-65 verify |
165339 cycles |
165518 cycles |
1.00 |
ML-DSA-87 keypair |
270791 cycles |
270264 cycles |
1.00 |
ML-DSA-87 sign |
725314 cycles |
725295 cycles |
1.00 |
ML-DSA-87 verify |
273144 cycles |
273484 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
57088 cycles |
57825 cycles |
0.99 |
ML-DSA-44 sign |
180631 cycles |
190230 cycles |
0.95 |
ML-DSA-44 verify |
61122 cycles |
63174 cycles |
0.97 |
ML-DSA-65 keypair |
99951 cycles |
102038 cycles |
0.98 |
ML-DSA-65 sign |
296215 cycles |
317731 cycles |
0.93 |
ML-DSA-65 verify |
100617 cycles |
104176 cycles |
0.97 |
ML-DSA-87 keypair |
153586 cycles |
157788 cycles |
0.97 |
ML-DSA-87 sign |
353630 cycles |
378266 cycles |
0.93 |
ML-DSA-87 verify |
153792 cycles |
158668 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
288220 cycles |
288573 cycles |
1.00 |
ML-DSA-44 sign |
926141 cycles |
930921 cycles |
0.99 |
ML-DSA-44 verify |
294580 cycles |
295012 cycles |
1.00 |
ML-DSA-65 keypair |
493233 cycles |
490626 cycles |
1.01 |
ML-DSA-65 sign |
1588611 cycles |
1551572 cycles |
1.02 |
ML-DSA-65 verify |
483195 cycles |
481319 cycles |
1.00 |
ML-DSA-87 keypair |
838376 cycles |
833327 cycles |
1.01 |
ML-DSA-87 sign |
2073639 cycles |
2076843 cycles |
1.00 |
ML-DSA-87 verify |
828140 cycles |
817076 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
69244 cycles |
72965 cycles |
0.95 |
ML-DSA-44 sign |
185852 cycles |
209026 cycles |
0.89 |
ML-DSA-44 verify |
69233 cycles |
74357 cycles |
0.93 |
ML-DSA-65 keypair |
119293 cycles |
122701 cycles |
0.97 |
ML-DSA-65 sign |
296002 cycles |
330324 cycles |
0.90 |
ML-DSA-65 verify |
115422 cycles |
121320 cycles |
0.95 |
ML-DSA-87 keypair |
201621 cycles |
208588 cycles |
0.97 |
ML-DSA-87 sign |
386910 cycles |
432454 cycles |
0.89 |
ML-DSA-87 verify |
193867 cycles |
203545 cycles |
0.95 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
69520 cycles |
69452 cycles |
1.00 |
ML-DSA-44 sign |
214186 cycles |
214770 cycles |
1.00 |
ML-DSA-44 verify |
72613 cycles |
72508 cycles |
1.00 |
ML-DSA-65 keypair |
123271 cycles |
122780 cycles |
1.00 |
ML-DSA-65 sign |
352561 cycles |
353195 cycles |
1.00 |
ML-DSA-65 verify |
120572 cycles |
120430 cycles |
1.00 |
ML-DSA-87 keypair |
201687 cycles |
200488 cycles |
1.01 |
ML-DSA-87 sign |
451836 cycles |
451124 cycles |
1.00 |
ML-DSA-87 verify |
198535 cycles |
198358 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
157730 cycles |
157874 cycles |
1.00 |
ML-DSA-44 sign |
562520 cycles |
563418 cycles |
1.00 |
ML-DSA-44 verify |
169489 cycles |
169267 cycles |
1.00 |
ML-DSA-65 keypair |
269712 cycles |
269343 cycles |
1.00 |
ML-DSA-65 sign |
929000 cycles |
928710 cycles |
1.00 |
ML-DSA-65 verify |
274518 cycles |
274926 cycles |
1.00 |
ML-DSA-87 keypair |
450802 cycles |
450143 cycles |
1.00 |
ML-DSA-87 sign |
1177764 cycles |
1177838 cycles |
1.00 |
ML-DSA-87 verify |
458464 cycles |
458629 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
136237 cycles |
135835 cycles |
1.00 |
ML-DSA-44 sign |
543996 cycles |
543662 cycles |
1.00 |
ML-DSA-44 verify |
148769 cycles |
148551 cycles |
1.00 |
ML-DSA-65 keypair |
227121 cycles |
227293 cycles |
1.00 |
ML-DSA-65 sign |
880222 cycles |
880495 cycles |
1.00 |
ML-DSA-65 verify |
236396 cycles |
235973 cycles |
1.00 |
ML-DSA-87 keypair |
374620 cycles |
376279 cycles |
1.00 |
ML-DSA-87 sign |
1098382 cycles |
1099997 cycles |
1.00 |
ML-DSA-87 verify |
387197 cycles |
388895 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
42030 cycles |
42495 cycles |
0.99 |
ML-DSA-44 sign |
130656 cycles |
136953 cycles |
0.95 |
ML-DSA-44 verify |
44057 cycles |
45651 cycles |
0.97 |
ML-DSA-65 keypair |
72606 cycles |
73514 cycles |
0.99 |
ML-DSA-65 sign |
211799 cycles |
222961 cycles |
0.95 |
ML-DSA-65 verify |
72874 cycles |
75361 cycles |
0.97 |
ML-DSA-87 keypair |
108885 cycles |
111831 cycles |
0.97 |
ML-DSA-87 sign |
248272 cycles |
264029 cycles |
0.94 |
ML-DSA-87 verify |
109458 cycles |
113897 cycles |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: 0241a1e | Previous: a50c7c5 | Ratio |
---|---|---|---|
ML-DSA-65 keypair |
74894 cycles |
72607 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
132657 cycles |
132765 cycles |
1.00 |
ML-DSA-44 sign |
498636 cycles |
498360 cycles |
1.00 |
ML-DSA-44 verify |
144922 cycles |
144978 cycles |
1.00 |
ML-DSA-65 keypair |
227247 cycles |
227374 cycles |
1.00 |
ML-DSA-65 sign |
813948 cycles |
813162 cycles |
1.00 |
ML-DSA-65 verify |
232046 cycles |
231727 cycles |
1.00 |
ML-DSA-87 keypair |
374396 cycles |
374649 cycles |
1.00 |
ML-DSA-87 sign |
1021479 cycles |
1021467 cycles |
1.00 |
ML-DSA-87 verify |
383843 cycles |
383727 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
115424 cycles |
116321 cycles |
0.99 |
ML-DSA-44 sign |
379369 cycles |
382918 cycles |
0.99 |
ML-DSA-44 verify |
120863 cycles |
121647 cycles |
0.99 |
ML-DSA-65 keypair |
199840 cycles |
200280 cycles |
1.00 |
ML-DSA-65 sign |
628198 cycles |
631576 cycles |
0.99 |
ML-DSA-65 verify |
199027 cycles |
199420 cycles |
1.00 |
ML-DSA-87 keypair |
327839 cycles |
327953 cycles |
1.00 |
ML-DSA-87 sign |
798048 cycles |
799114 cycles |
1.00 |
ML-DSA-87 verify |
326496 cycles |
326610 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
73719 cycles |
73940 cycles |
1.00 |
ML-DSA-44 sign |
226972 cycles |
228396 cycles |
0.99 |
ML-DSA-44 verify |
77935 cycles |
78071 cycles |
1.00 |
ML-DSA-65 keypair |
129724 cycles |
129923 cycles |
1.00 |
ML-DSA-65 sign |
375574 cycles |
377186 cycles |
1.00 |
ML-DSA-65 verify |
128828 cycles |
129040 cycles |
1.00 |
ML-DSA-87 keypair |
210463 cycles |
210651 cycles |
1.00 |
ML-DSA-87 sign |
476614 cycles |
478561 cycles |
1.00 |
ML-DSA-87 verify |
209714 cycles |
210198 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
122811 cycles |
120120 cycles |
1.02 |
ML-DSA-44 sign |
462112 cycles |
453371 cycles |
1.02 |
ML-DSA-44 verify |
133080 cycles |
132326 cycles |
1.01 |
ML-DSA-65 keypair |
204594 cycles |
204716 cycles |
1.00 |
ML-DSA-65 sign |
737503 cycles |
737570 cycles |
1.00 |
ML-DSA-65 verify |
209127 cycles |
210009 cycles |
1.00 |
ML-DSA-87 keypair |
339240 cycles |
338444 cycles |
1.00 |
ML-DSA-87 sign |
943406 cycles |
941628 cycles |
1.00 |
ML-DSA-87 verify |
347932 cycles |
349377 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
138571 cycles |
138584 cycles |
1.00 |
ML-DSA-44 sign |
494766 cycles |
495481 cycles |
1.00 |
ML-DSA-44 verify |
148733 cycles |
148760 cycles |
1.00 |
ML-DSA-65 keypair |
241546 cycles |
241312 cycles |
1.00 |
ML-DSA-65 sign |
809788 cycles |
809760 cycles |
1.00 |
ML-DSA-65 verify |
241008 cycles |
240909 cycles |
1.00 |
ML-DSA-87 keypair |
396585 cycles |
396477 cycles |
1.00 |
ML-DSA-87 sign |
1032199 cycles |
1031613 cycles |
1.00 |
ML-DSA-87 verify |
402527 cycles |
402260 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
213600 cycles |
213647 cycles |
1.00 |
ML-DSA-44 sign |
781050 cycles |
794200 cycles |
0.98 |
ML-DSA-44 verify |
230186 cycles |
230157 cycles |
1.00 |
ML-DSA-65 keypair |
381315 cycles |
381964 cycles |
1.00 |
ML-DSA-65 sign |
1287118 cycles |
1286398 cycles |
1.00 |
ML-DSA-65 verify |
373077 cycles |
373972 cycles |
1.00 |
ML-DSA-87 keypair |
609715 cycles |
609842 cycles |
1.00 |
ML-DSA-87 sign |
1643926 cycles |
1645519 cycles |
1.00 |
ML-DSA-87 verify |
621660 cycles |
621691 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
822653 cycles |
822607 cycles |
1.00 |
ML-DSA-44 sign |
3328293 cycles |
3327961 cycles |
1.00 |
ML-DSA-44 verify |
920551 cycles |
919517 cycles |
1.00 |
ML-DSA-65 keypair |
1396003 cycles |
1395382 cycles |
1.00 |
ML-DSA-65 sign |
5444867 cycles |
5429294 cycles |
1.00 |
ML-DSA-65 verify |
1463286 cycles |
1462157 cycles |
1.00 |
ML-DSA-87 keypair |
2303082 cycles |
2304525 cycles |
1.00 |
ML-DSA-87 sign |
6826409 cycles |
6826659 cycles |
1.00 |
ML-DSA-87 verify |
2398826 cycles |
2401170 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
465726 cycles |
465321 cycles |
1.00 |
ML-DSA-44 sign |
2226268 cycles |
2229058 cycles |
1.00 |
ML-DSA-44 verify |
546713 cycles |
547134 cycles |
1.00 |
ML-DSA-65 keypair |
778115 cycles |
779224 cycles |
1.00 |
ML-DSA-65 sign |
3635158 cycles |
3638239 cycles |
1.00 |
ML-DSA-65 verify |
849641 cycles |
850223 cycles |
1.00 |
ML-DSA-87 keypair |
1261562 cycles |
1267615 cycles |
1.00 |
ML-DSA-87 sign |
4538959 cycles |
4512261 cycles |
1.01 |
ML-DSA-87 verify |
1370897 cycles |
1373672 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
223435 cycles |
241131 cycles |
0.93 |
ML-DSA-44 sign |
660117 cycles |
692486 cycles |
0.95 |
ML-DSA-44 verify |
226160 cycles |
231043 cycles |
0.98 |
ML-DSA-65 keypair |
389374 cycles |
393697 cycles |
0.99 |
ML-DSA-65 sign |
1102631 cycles |
1084494 cycles |
1.02 |
ML-DSA-65 verify |
380262 cycles |
375628 cycles |
1.01 |
ML-DSA-87 keypair |
669282 cycles |
671942 cycles |
1.00 |
ML-DSA-87 sign |
1514678 cycles |
1488161 cycles |
1.02 |
ML-DSA-87 verify |
651259 cycles |
646858 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: 0241a1e | Previous: a50c7c5 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
243405 cycles |
234807 cycles |
1.04 |
ML-DSA-44 verify |
260136 cycles |
246581 cycles |
1.05 |
ML-DSA-65 keypair |
422006 cycles |
396165 cycles |
1.07 |
ML-DSA-65 verify |
412093 cycles |
399635 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
311676 cycles |
301890 cycles |
1.03 |
ML-DSA-44 sign |
1194281 cycles |
1200079 cycles |
1.00 |
ML-DSA-44 verify |
329835 cycles |
332889 cycles |
0.99 |
ML-DSA-65 keypair |
574421 cycles |
568930 cycles |
1.01 |
ML-DSA-65 sign |
2008654 cycles |
1985832 cycles |
1.01 |
ML-DSA-65 verify |
551223 cycles |
539286 cycles |
1.02 |
ML-DSA-87 keypair |
845329 cycles |
863211 cycles |
0.98 |
ML-DSA-87 sign |
2432925 cycles |
2481765 cycles |
0.98 |
ML-DSA-87 verify |
859958 cycles |
888876 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: a2e40c1 | Previous: abf8281 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
311676 cycles |
301890 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
816f665
to
1fda1e9
Compare
Thanks @jakemas! I hope it is okay if @jammychiou1 continues this PR and adds AArch64 in additon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split this in two files. that will make verification easier later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'll neaten up the x86 parts and then @jammychiou1 can take over and add the AArch64!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jammychiou1 is already working on this since yesterday.
1fda1e9
to
67a545f
Compare
Thank you @jakemas for letting me take over! Hope that I didn't introduce bugs to your commits while rebasing. I'll add the AArch64 parts later. |
Marking this as draft. Please mark it as ready for review when it's ready. |
67a545f
to
38156be
Compare
Signed-off-by: Jake Massimo <[email protected]>
Signed-off-by: Jake Massimo <[email protected]>
Signed-off-by: Jake Massimo <[email protected]>
Signed-off-by: Jake Massimo <[email protected]>
Signed-off-by: Jake Massimo <[email protected]>
Run the updated autogen and format scripts again. Signed-off-by: jammychiou1 <[email protected]>
Signed-off-by: jammychiou1 <[email protected]>
Signed-off-by: jammychiou1 <[email protected]>
Signed-off-by: jammychiou1 <[email protected]>
38156be
to
cdf39ee
Compare
This commit adds a native implementation of poly_pointwise_montgomery written from scratch. Co-authored-by: Matthias J. Kannwischer <[email protected]> Signed-off-by: jammychiou1 <[email protected]>
This commit adds native implementations of polyvecl_pointwise_acc_montgomery written from scratch. Co-authored-by: Matthias J. Kannwischer <[email protected]> Signed-off-by: jammychiou1 <[email protected]>
cdf39ee
to
a2e40c1
Compare
poly_pointwise_montgomery
assembly #337working towards: #303
Testing CI - CBMC is not happy with me today.
I have seen #346 and #334 -- I just want to forcast out some proofs.
The optimization seems beneficial, here are the numbers:
Generic C Implementation (no native AVX2):
Native AVX2 Implementation:
Overall: