-
Notifications
You must be signed in to change notification settings - Fork 23
Avoid calling keccak_absorb with partial lanes #450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Please provide a description for this PR. What is the point of this refactoring? What benefit does it bring? Please provide CBMC proof harness and Makefile for any new functions. |
@bremoran, sorry for the long wait for the review on this. Could you please rebase this on top of the changes in main, so we can benchmark and review it? |
aa57a15
to
2cd2d61
Compare
Signed-off-by: Brendan Moran <[email protected]>
Signed-off-by: Brendan Moran <[email protected]>
Signed-off-by: Matthias J. Kannwischer <[email protected]>
2cd2d61
to
a8d2d6a
Compare
This gets inlined into the proof of mld_H - no need for a separate contract if the proofs go through. Signed-off-by: Matthias J. Kannwischer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
47837 cycles |
47836 cycles |
1.00 |
ML-DSA-44 sign |
156325 cycles |
156334 cycles |
1.00 |
ML-DSA-44 verify |
52453 cycles |
52450 cycles |
1.00 |
ML-DSA-65 keypair |
83684 cycles |
83701 cycles |
1.00 |
ML-DSA-65 sign |
255488 cycles |
255371 cycles |
1.00 |
ML-DSA-65 verify |
85590 cycles |
85601 cycles |
1.00 |
ML-DSA-87 keypair |
136128 cycles |
136113 cycles |
1.00 |
ML-DSA-87 sign |
320962 cycles |
321312 cycles |
1.00 |
ML-DSA-87 verify |
137899 cycles |
138009 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (no-opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
115074 cycles |
115039 cycles |
1.00 |
ML-DSA-44 sign |
430931 cycles |
430787 cycles |
1.00 |
ML-DSA-44 verify |
122238 cycles |
122176 cycles |
1.00 |
ML-DSA-65 keypair |
197047 cycles |
196905 cycles |
1.00 |
ML-DSA-65 sign |
701023 cycles |
701285 cycles |
1.00 |
ML-DSA-65 verify |
197670 cycles |
197656 cycles |
1.00 |
ML-DSA-87 keypair |
334759 cycles |
335149 cycles |
1.00 |
ML-DSA-87 sign |
884276 cycles |
884767 cycles |
1.00 |
ML-DSA-87 verify |
328610 cycles |
329046 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
281441 cycles |
288008 cycles |
0.98 |
ML-DSA-44 sign |
971200 cycles |
972295 cycles |
1.00 |
ML-DSA-44 verify |
301117 cycles |
306786 cycles |
0.98 |
ML-DSA-65 keypair |
482405 cycles |
492097 cycles |
0.98 |
ML-DSA-65 sign |
1584980 cycles |
1609911 cycles |
0.98 |
ML-DSA-65 verify |
487166 cycles |
493789 cycles |
0.99 |
ML-DSA-87 keypair |
817778 cycles |
830114 cycles |
0.99 |
ML-DSA-87 sign |
2103778 cycles |
2168352 cycles |
0.97 |
ML-DSA-87 verify |
823572 cycles |
838050 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
35501 cycles |
35660 cycles |
1.00 |
ML-DSA-44 sign |
132302 cycles |
132372 cycles |
1.00 |
ML-DSA-44 verify |
41006 cycles |
40941 cycles |
1.00 |
ML-DSA-65 keypair |
63922 cycles |
63906 cycles |
1.00 |
ML-DSA-65 sign |
220917 cycles |
220391 cycles |
1.00 |
ML-DSA-65 verify |
66232 cycles |
66307 cycles |
1.00 |
ML-DSA-87 keypair |
95630 cycles |
96815 cycles |
0.99 |
ML-DSA-87 sign |
259768 cycles |
265102 cycles |
0.98 |
ML-DSA-87 verify |
99879 cycles |
100242 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
95541 cycles |
95838 cycles |
1.00 |
ML-DSA-44 sign |
343758 cycles |
345726 cycles |
0.99 |
ML-DSA-44 verify |
101480 cycles |
101478 cycles |
1.00 |
ML-DSA-65 keypair |
164662 cycles |
164854 cycles |
1.00 |
ML-DSA-65 sign |
571713 cycles |
568786 cycles |
1.01 |
ML-DSA-65 verify |
166031 cycles |
165621 cycles |
1.00 |
ML-DSA-87 keypair |
271224 cycles |
270260 cycles |
1.00 |
ML-DSA-87 sign |
725476 cycles |
724985 cycles |
1.00 |
ML-DSA-87 verify |
273047 cycles |
273226 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
41585 cycles |
45299 cycles |
0.92 |
ML-DSA-44 sign |
143200 cycles |
154336 cycles |
0.93 |
ML-DSA-44 verify |
46943 cycles |
49529 cycles |
0.95 |
ML-DSA-65 keypair |
73940 cycles |
74392 cycles |
0.99 |
ML-DSA-65 sign |
236322 cycles |
237019 cycles |
1.00 |
ML-DSA-65 verify |
77313 cycles |
78423 cycles |
0.99 |
ML-DSA-87 keypair |
111858 cycles |
112104 cycles |
1.00 |
ML-DSA-87 sign |
279992 cycles |
279301 cycles |
1.00 |
ML-DSA-87 verify |
117273 cycles |
116800 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
57678 cycles |
57941 cycles |
1.00 |
ML-DSA-44 sign |
201328 cycles |
201248 cycles |
1.00 |
ML-DSA-44 verify |
66243 cycles |
65669 cycles |
1.01 |
ML-DSA-65 keypair |
102316 cycles |
101945 cycles |
1.00 |
ML-DSA-65 sign |
332994 cycles |
333057 cycles |
1.00 |
ML-DSA-65 verify |
107021 cycles |
107115 cycles |
1.00 |
ML-DSA-87 keypair |
157063 cycles |
157562 cycles |
1.00 |
ML-DSA-87 sign |
399257 cycles |
399500 cycles |
1.00 |
ML-DSA-87 verify |
162886 cycles |
162176 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
71344 cycles |
71956 cycles |
0.99 |
ML-DSA-44 sign |
212929 cycles |
214221 cycles |
0.99 |
ML-DSA-44 verify |
74779 cycles |
75325 cycles |
0.99 |
ML-DSA-65 keypair |
123608 cycles |
123638 cycles |
1.00 |
ML-DSA-65 sign |
345402 cycles |
346781 cycles |
1.00 |
ML-DSA-65 verify |
124084 cycles |
123918 cycles |
1.00 |
ML-DSA-87 keypair |
206533 cycles |
208833 cycles |
0.99 |
ML-DSA-87 sign |
447608 cycles |
447509 cycles |
1.00 |
ML-DSA-87 verify |
205360 cycles |
204748 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
69348 cycles |
69498 cycles |
1.00 |
ML-DSA-44 sign |
222588 cycles |
222917 cycles |
1.00 |
ML-DSA-44 verify |
74645 cycles |
74589 cycles |
1.00 |
ML-DSA-65 keypair |
123409 cycles |
123347 cycles |
1.00 |
ML-DSA-65 sign |
365960 cycles |
366381 cycles |
1.00 |
ML-DSA-65 verify |
123609 cycles |
123483 cycles |
1.00 |
ML-DSA-87 keypair |
201689 cycles |
200598 cycles |
1.01 |
ML-DSA-87 sign |
467807 cycles |
466978 cycles |
1.00 |
ML-DSA-87 verify |
201993 cycles |
201918 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
120674 cycles |
120817 cycles |
1.00 |
ML-DSA-44 sign |
452232 cycles |
453984 cycles |
1.00 |
ML-DSA-44 verify |
131541 cycles |
131897 cycles |
1.00 |
ML-DSA-65 keypair |
204081 cycles |
205210 cycles |
0.99 |
ML-DSA-65 sign |
739495 cycles |
738619 cycles |
1.00 |
ML-DSA-65 verify |
209598 cycles |
210495 cycles |
1.00 |
ML-DSA-87 keypair |
339929 cycles |
343513 cycles |
0.99 |
ML-DSA-87 sign |
942376 cycles |
952408 cycles |
0.99 |
ML-DSA-87 verify |
350063 cycles |
353724 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
135463 cycles |
136469 cycles |
0.99 |
ML-DSA-44 sign |
542488 cycles |
545358 cycles |
0.99 |
ML-DSA-44 verify |
148719 cycles |
149472 cycles |
0.99 |
ML-DSA-65 keypair |
227337 cycles |
229684 cycles |
0.99 |
ML-DSA-65 sign |
880524 cycles |
888847 cycles |
0.99 |
ML-DSA-65 verify |
236252 cycles |
237595 cycles |
0.99 |
ML-DSA-87 keypair |
375243 cycles |
375230 cycles |
1.00 |
ML-DSA-87 sign |
1102759 cycles |
1101253 cycles |
1.00 |
ML-DSA-87 verify |
387967 cycles |
389206 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
157565 cycles |
158047 cycles |
1.00 |
ML-DSA-44 sign |
563855 cycles |
566208 cycles |
1.00 |
ML-DSA-44 verify |
169337 cycles |
169650 cycles |
1.00 |
ML-DSA-65 keypair |
270050 cycles |
269850 cycles |
1.00 |
ML-DSA-65 sign |
928714 cycles |
928430 cycles |
1.00 |
ML-DSA-65 verify |
275259 cycles |
275016 cycles |
1.00 |
ML-DSA-87 keypair |
450252 cycles |
450841 cycles |
1.00 |
ML-DSA-87 sign |
1180577 cycles |
1179105 cycles |
1.00 |
ML-DSA-87 verify |
460070 cycles |
459184 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
73954 cycles |
73980 cycles |
1.00 |
ML-DSA-44 sign |
236004 cycles |
236034 cycles |
1.00 |
ML-DSA-44 verify |
80304 cycles |
79930 cycles |
1.00 |
ML-DSA-65 keypair |
129494 cycles |
129578 cycles |
1.00 |
ML-DSA-65 sign |
388474 cycles |
388294 cycles |
1.00 |
ML-DSA-65 verify |
131006 cycles |
130908 cycles |
1.00 |
ML-DSA-87 keypair |
210035 cycles |
210041 cycles |
1.00 |
ML-DSA-87 sign |
491914 cycles |
492267 cycles |
1.00 |
ML-DSA-87 verify |
212663 cycles |
212589 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
462159 cycles |
466373 cycles |
0.99 |
ML-DSA-44 sign |
2216904 cycles |
2214442 cycles |
1.00 |
ML-DSA-44 verify |
547750 cycles |
550635 cycles |
0.99 |
ML-DSA-65 keypair |
778716 cycles |
777523 cycles |
1.00 |
ML-DSA-65 sign |
3628400 cycles |
3643249 cycles |
1.00 |
ML-DSA-65 verify |
853665 cycles |
849541 cycles |
1.00 |
ML-DSA-87 keypair |
1250941 cycles |
1269297 cycles |
0.99 |
ML-DSA-87 sign |
4442690 cycles |
4513601 cycles |
0.98 |
ML-DSA-87 verify |
1364598 cycles |
1373707 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
115550 cycles |
115640 cycles |
1.00 |
ML-DSA-44 sign |
392162 cycles |
392538 cycles |
1.00 |
ML-DSA-44 verify |
123972 cycles |
123749 cycles |
1.00 |
ML-DSA-65 keypair |
200210 cycles |
200190 cycles |
1.00 |
ML-DSA-65 sign |
648965 cycles |
648572 cycles |
1.00 |
ML-DSA-65 verify |
203087 cycles |
202921 cycles |
1.00 |
ML-DSA-87 keypair |
328316 cycles |
327699 cycles |
1.00 |
ML-DSA-87 sign |
822365 cycles |
820887 cycles |
1.00 |
ML-DSA-87 verify |
332366 cycles |
331384 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
132701 cycles |
132744 cycles |
1.00 |
ML-DSA-44 sign |
498674 cycles |
498324 cycles |
1.00 |
ML-DSA-44 verify |
145009 cycles |
144951 cycles |
1.00 |
ML-DSA-65 keypair |
226922 cycles |
227315 cycles |
1.00 |
ML-DSA-65 sign |
814244 cycles |
813246 cycles |
1.00 |
ML-DSA-65 verify |
231594 cycles |
231619 cycles |
1.00 |
ML-DSA-87 keypair |
374429 cycles |
374603 cycles |
1.00 |
ML-DSA-87 sign |
1021798 cycles |
1021441 cycles |
1.00 |
ML-DSA-87 verify |
384208 cycles |
383659 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
138585 cycles |
138628 cycles |
1.00 |
ML-DSA-44 sign |
495158 cycles |
495579 cycles |
1.00 |
ML-DSA-44 verify |
148937 cycles |
148792 cycles |
1.00 |
ML-DSA-65 keypair |
241460 cycles |
241330 cycles |
1.00 |
ML-DSA-65 sign |
810228 cycles |
809886 cycles |
1.00 |
ML-DSA-65 verify |
241222 cycles |
240937 cycles |
1.00 |
ML-DSA-87 keypair |
396305 cycles |
396441 cycles |
1.00 |
ML-DSA-87 sign |
1031970 cycles |
1031506 cycles |
1.00 |
ML-DSA-87 verify |
402475 cycles |
402272 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
213442 cycles |
213493 cycles |
1.00 |
ML-DSA-44 sign |
781132 cycles |
794089 cycles |
0.98 |
ML-DSA-44 verify |
230277 cycles |
230005 cycles |
1.00 |
ML-DSA-65 keypair |
380712 cycles |
381674 cycles |
1.00 |
ML-DSA-65 sign |
1287339 cycles |
1285921 cycles |
1.00 |
ML-DSA-65 verify |
373222 cycles |
373670 cycles |
1.00 |
ML-DSA-87 keypair |
609594 cycles |
609555 cycles |
1.00 |
ML-DSA-87 sign |
1644483 cycles |
1645486 cycles |
1.00 |
ML-DSA-87 verify |
621636 cycles |
621588 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
115380 cycles |
115390 cycles |
1.00 |
ML-DSA-44 sign |
392034 cycles |
392115 cycles |
1.00 |
ML-DSA-44 verify |
123904 cycles |
123546 cycles |
1.00 |
ML-DSA-65 keypair |
200071 cycles |
199986 cycles |
1.00 |
ML-DSA-65 sign |
648490 cycles |
647905 cycles |
1.00 |
ML-DSA-65 verify |
203071 cycles |
202802 cycles |
1.00 |
ML-DSA-87 keypair |
327348 cycles |
327077 cycles |
1.00 |
ML-DSA-87 sign |
819919 cycles |
819688 cycles |
1.00 |
ML-DSA-87 verify |
331865 cycles |
331074 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
822021 cycles |
823286 cycles |
1.00 |
ML-DSA-44 sign |
3332036 cycles |
3327209 cycles |
1.00 |
ML-DSA-44 verify |
920516 cycles |
918657 cycles |
1.00 |
ML-DSA-65 keypair |
1395987 cycles |
1400241 cycles |
1.00 |
ML-DSA-65 sign |
5415850 cycles |
5443356 cycles |
0.99 |
ML-DSA-65 verify |
1464876 cycles |
1464467 cycles |
1.00 |
ML-DSA-87 keypair |
2296738 cycles |
2298732 cycles |
1.00 |
ML-DSA-87 sign |
6800722 cycles |
6822286 cycles |
1.00 |
ML-DSA-87 verify |
2402751 cycles |
2403402 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
213144 cycles |
213012 cycles |
1.00 |
ML-DSA-44 sign |
780665 cycles |
781249 cycles |
1.00 |
ML-DSA-44 verify |
230117 cycles |
230192 cycles |
1.00 |
ML-DSA-65 keypair |
380413 cycles |
380850 cycles |
1.00 |
ML-DSA-65 sign |
1304248 cycles |
1291535 cycles |
1.01 |
ML-DSA-65 verify |
372936 cycles |
372768 cycles |
1.00 |
ML-DSA-87 keypair |
609458 cycles |
609112 cycles |
1.00 |
ML-DSA-87 sign |
1641897 cycles |
1642387 cycles |
1.00 |
ML-DSA-87 verify |
621885 cycles |
621381 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
239303 cycles |
231907 cycles |
1.03 |
ML-DSA-44 sign |
701852 cycles |
692048 cycles |
1.01 |
ML-DSA-44 verify |
238239 cycles |
234215 cycles |
1.02 |
ML-DSA-65 keypair |
395898 cycles |
397168 cycles |
1.00 |
ML-DSA-65 sign |
1112619 cycles |
1103780 cycles |
1.01 |
ML-DSA-65 verify |
392007 cycles |
380128 cycles |
1.03 |
ML-DSA-87 keypair |
662188 cycles |
660299 cycles |
1.00 |
ML-DSA-87 sign |
1484409 cycles |
1454152 cycles |
1.02 |
ML-DSA-87 verify |
645366 cycles |
650049 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
239303 cycles |
231907 cycles |
1.03 |
ML-DSA-65 verify |
392007 cycles |
380128 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
315776 cycles |
311426 cycles |
1.01 |
ML-DSA-44 sign |
1230641 cycles |
1214729 cycles |
1.01 |
ML-DSA-44 verify |
353493 cycles |
338228 cycles |
1.05 |
ML-DSA-65 keypair |
562601 cycles |
572363 cycles |
0.98 |
ML-DSA-65 sign |
2009516 cycles |
1992144 cycles |
1.01 |
ML-DSA-65 verify |
541825 cycles |
547811 cycles |
0.99 |
ML-DSA-87 keypair |
884415 cycles |
884798 cycles |
1.00 |
ML-DSA-87 sign |
2488138 cycles |
2501693 cycles |
0.99 |
ML-DSA-87 verify |
912836 cycles |
901676 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: f82f729 | Previous: efe03f9 | Ratio |
---|---|---|---|
ML-DSA-44 verify |
353493 cycles |
338228 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
Performance-wise, there is no reason to not merge this. There is even a small improvement on Cortex-A55 of 1-3% and (for reasons that are beyond me) on 4th gen AMD EPYC (c7a). CBMC proofs are failing, but we can fix that at a later point. Fundamentally, I believe such caching does not belong in WDYT @hanno-becker? |
I see one proof failure in mld_H. Let me take a look... |
Signed-off-by: Rod Chapman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bremoran! I can definitely see this being useful for 32-bit platforms.
A few requests:
- I don't think this needs an API extension: Instead, the buffering of state prior to XOR'ing should be an implementation detail (add a buffer for the incomplete lane) of the existing absorb API.
- We should have documentation and CBMC proofs for new functionality.
- The new logic belongs to FIPS-202.
Could you adjust the PR accordingly?
I agree. Marking this as draft for now. @bremoran, please mark it as ready when you have updated the PR. |
On 32-bit architectures, each call to
mld_keccakf1600_xor_bytes
incurs an overhead. For example, on Arm v7-M and Arm v8-M and using the optimised bit interleave from xkcp xoring a lane into the state incurs an overhead of 37 instructions. Any time an incomplete lane is xored into the state, this penalty is paid twice. This PR ensures that only full lanes are xored into the state.Fixes #445