Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the internal usage of Cstruct.t by the bytes type #146

Merged
merged 25 commits into from
Feb 13, 2024

Conversation

dinosaure
Copy link
Member

This PR wants to delete more and more an usage of Cstruct.t. The goal of it is to give an opportunity to use bytes instead of Cstruct.t. Indeed, in a certain usage, mirage-crypto-ec allocates too many Cstruct.t which put a huge pressure on the GC. In some of my stress test, I finally get an Out_of_memory exception and an introspection of the memory consumption shows me that mirage-crypto-ec allocates 26 % of what I used and most of these objects:

  1. are a small bigarray
  2. come from this code

This PR is a draft to internally uses bytes and keep the same interface with a systematic derivation of functions with bytes. Then, C calls use only bytes instead of Cstruct.t.

I will come back with some statistics to see if this PR change something about allocations or not.

ec/mirage_crypto_ec.ml Outdated Show resolved Hide resolved
ec/native/np224_stubs.c Outdated Show resolved Hide resolved
ec/mirage_crypto_ec.ml Outdated Show resolved Hide resolved
@hannesm
Copy link
Member

hannesm commented Oct 28, 2021

I went through the code, this looks mostly good. Some questions remain, though:

  • why bytes and not string?
  • why retain the Cstruct.t based interface?
  • did you measure the performance with this change, and do you see a difference?

ec/mirage_crypto_ec.ml Outdated Show resolved Hide resolved
ec/mirage_crypto_ec.ml Outdated Show resolved Hide resolved
ec/mirage_crypto_ec.ml Outdated Show resolved Hide resolved
@hannesm
Copy link
Member

hannesm commented Sep 13, 2022

Any update on this? Any performance metrics supporting this change? (I'm all in for such a change).

@palainp
Copy link
Member

palainp commented Jan 15, 2024

Hello, thanks for this PR. I currently don't really know how to stresstest it but I found this PR interesting (untested yet, planed if the next weeks).

TLDR: I came to the same conclusions, Cstruct.t here put too much pressure on the GC and I think it's better to remove most possible.

So far I'm on the following journey:

If interested, one should be able to reproduce with:

opam pin ocaml-solo5 https://github.com/palainp/ocaml-solo5.git#no-dlmalloc -y
mirage configure -t spt  && make depend && dune build
solo5-spt --net:service=service --mem=16 dist/resolver.spt > log.txt
grep requested log.txt  | sort | uniq -c # plus maybe some shell magic to unify the allocation of the same size

@hannesm hannesm force-pushed the bytes-instead-of-cstruct branch from 7f3fc49 to f9938f2 Compare February 9, 2024 11:22
@hannesm hannesm marked this pull request as ready for review February 9, 2024 11:22
@hannesm
Copy link
Member

hannesm commented Feb 9, 2024

An update: I rebased this on top of main (well, taking @palainp work here) //cc @Firobe

I also marked it as ready for review, and would like to merge this rather sooner than later.

@hannesm hannesm force-pushed the bytes-instead-of-cstruct branch from f9938f2 to bf89829 Compare February 9, 2024 11:24
@hannesm
Copy link
Member

hannesm commented Feb 9, 2024

I reverted the API changes in 65dbfae -- so the _bytes functions aren't exposed.

The reasoning behind this is that I do not like the new names, and would like to avoid introducing them. If necessary, I'm fine to just move everything to string/bytes (with a breaking API change), but introducing new things in the interface makes more work (at some point we have to deprecate and switch the existing ones to bytes)...

@palainp
Copy link
Member

palainp commented Feb 9, 2024

FWIW I'm ok with the deletion of new bytes variants from the API. This was maily focused on the internal usage of CS vs bytes :)

@hannesm hannesm force-pushed the bytes-instead-of-cstruct branch from 38199d9 to f6b3613 Compare February 9, 2024 12:46
@hannesm
Copy link
Member

hannesm commented Feb 9, 2024

Some time later, I force-pushed the commits again - sorry about that.

I also removed the offset from the 25519 code (which was always 0), reducing the arguments passed to C.

Also, since #192 got merged, I managed to gather some benchmarks on my laptop (Lenovo X250, Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz) using bench/speed.exe ecdsa-generate ecdsa-sign ecdsa-verify ecdh-secret ecdh-share (if you can reproduce similar numbers, let me know. if you have different numbers, also let me know).

main branch:

  • [ecdsa-generate]
    P224: 1031.907 ops per second (10248 iters in 9.931)
    P256: 912.084 ops per second (9416 iters in 10.324)
    P384: 429.517 ops per second (4041 iters in 9.408)
    P521: 164.375 ops per second (1873 iters in 11.395)
    Ed25519: 18128.142 ops per second (208333 iters in 11.492)

  • [ecdsa-sign]
    P224: 915.555 ops per second (10277 iters in 11.225)
    P256: 747.909 ops per second (9228 iters in 12.338)
    P384: 401.500 ops per second (3343 iters in 8.326)
    P521: 139.636 ops per second (1379 iters in 9.876)
    Ed25519: 10712.649 ops per second (60459 iters in 5.644)

  • [ecdsa-verify]
    P224: 317.027 ops per second (3238 iters in 10.214)
    P256: 284.644 ops per second (2118 iters in 7.441)
    P384: 136.597 ops per second (1396 iters in 10.220)
    P521: 66.025 ops per second (590 iters in 8.936)
    Ed25519: 7599.516 ops per second (72674 iters in 9.563)

  • [ecdh-secret]
    P224: 820.940 ops per second (5955 iters in 7.254)
    P256: 792.998 ops per second (9704 iters in 12.237)
    P384: 391.008 ops per second (4041 iters in 10.335)
    P521: 176.086 ops per second (1708 iters in 9.700)
    X25519: 12821.927 ops per second (140845 iters in 10.985)

  • [ecdh-share]
    P224: 871.088 ops per second (10557 iters in 12.119)
    P256: 857.838 ops per second (5867 iters in 6.839)
    P384: 355.362 ops per second (4085 iters in 11.495)
    P521: 164.099 ops per second (1674 iters in 10.201)
    X25519: 12143.960 ops per second (121359 iters in 9.993)

this PR (8bdd655):

  • [ecdsa-generate]
    P224: 2391.019 ops per second (22321 iters in 9.335)
    P256: 2057.796 ops per second (21777 iters in 10.583)
    P384: 661.070 ops per second (8082 iters in 12.226)
    P521: 262.890 ops per second (1860 iters in 7.075)
    Ed25519: 21555.721 ops per second (222222 iters in 10.309)

  • [ecdsa-sign]
    P224: 1849.314 ops per second (16134 iters in 8.724)
    P256: 1725.753 ops per second (18102 iters in 10.489)
    P384: 594.245 ops per second (6502 iters in 10.942)
    P521: 185.539 ops per second (1796 iters in 9.680)
    Ed25519: 11071.041 ops per second (113636 iters in 10.264)

  • [ecdsa-verify]
    P224: 659.605 ops per second (6129 iters in 9.292)
    P256: 593.558 ops per second (6026 iters in 10.152)
    P384: 224.130 ops per second (2376 iters in 10.601)
    P521: 87.659 ops per second (836 iters in 9.537)
    Ed25519: 7095.430 ops per second (75987 iters in 10.709)

  • [ecdh-secret]
    P224: 1924.144 ops per second (20088 iters in 10.440)
    P256: 1817.788 ops per second (18670 iters in 10.271)
    P384: 681.528 ops per second (7148 iters in 10.488)
    P521: 261.183 ops per second (2619 iters in 10.027)
    X25519: 13279.026 ops per second (65876 iters in 4.961)

  • [ecdh-share]
    P224: 1858.835 ops per second (20145 iters in 10.837)
    P256: 1788.042 ops per second (19091 iters in 10.677)
    P384: 656.035 ops per second (6715 iters in 10.236)
    P521: 255.536 ops per second (2626 iters in 10.276)
    X25519: 13593.562 ops per second (133333 iters in 9.809)

Tl;DR: around a factor of 2. I'm fine to merge this in the current state (again cc @Firobe @dinosaure @palainp) -- there could be further improvements:

  • use string where applicable
  • expose bytes/string in the API

I'm fine to leave those as separate PR(s).

@hannesm hannesm force-pushed the bytes-instead-of-cstruct branch from f6b3613 to 8bdd655 Compare February 9, 2024 12:51
@dinosaure
Copy link
Member Author

I can confirm on AMD Ryzen 9 7500X a factor of 2 as you on the benchmark.

ec/mirage_crypto_ec.ml Outdated Show resolved Hide resolved
ec/mirage_crypto_ec.ml Outdated Show resolved Hide resolved
ec/mirage_crypto_ec.ml Outdated Show resolved Hide resolved
ec/mirage_crypto_ec.ml Outdated Show resolved Hide resolved
ec/mirage_crypto_ec.ml Outdated Show resolved Hide resolved
@hannesm
Copy link
Member

hannesm commented Feb 9, 2024

Thanks for your comments, @reynir.

I guess it makes sense to include the "use string instead of bytes" in here!? (And convert the curve values to strings directly as well.) Especially since we're not in a rush.

@hannesm
Copy link
Member

hannesm commented Feb 10, 2024

Using string in 25519 (everything only for 25519, numbers in op/s)

baseline (OpenSSL): ecdsa-sign 21942.7 ecdsa-verify 7080.6 ecdh: 26200.7 (openssl speed ed25519 and openssl speed ecdhx25519)

op main 8bdd655 string 0c04144 string / main
ecdsa-generate 18128.142 21555.721 24610.722 1.36
ecdsa-sign 10712.649 11071.041 12159.242 1.14
ecdsa-verify 7599.516 7095.430 8953.500 1.18
ecdh-secret 12821.927 13279.026 15661.647 1.22
ecdh-share 12143.960 13593.562 15655.237 1.29

@hannesm
Copy link
Member

hannesm commented Feb 10, 2024

I also moved NIST to use string -- likely there's still some fine tuning to do (to minimize the calls to out_fe_to_fe) - a full review is likely a good idea.

As example, Params.a and Params.b could directly be called with Fe.from_be_bytes

Statistics f4991c2:

  • [ecdsa-generate]
    P224: 2189.999 ops per second (23730 iters in 10.836)
    P256: 1964.838 ops per second (20868 iters in 10.621)
    P384: 762.843 ops per second (8061 iters in 10.567)
    P521: 267.764 ops per second (2867 iters in 10.707)
    Ed25519: 21990.043 ops per second (209205 iters in 9.514)

  • [ecdsa-sign]
    P224: 1845.506 ops per second (18341 iters in 9.938)
    P256: 1623.831 ops per second (16852 iters in 10.378)
    P384: 613.392 ops per second (6647 iters in 10.836)
    P521: 168.077 ops per second (1667 iters in 9.918)
    Ed25519: 10326.991 ops per second (104166 iters in 10.087)

  • [ecdsa-verify]
    P224: 626.230 ops per second (6056 iters in 9.671)
    P256: 569.914 ops per second (5625 iters in 9.870)
    P384: 212.195 ops per second (2388 iters in 11.254)
    P521: 84.984 ops per second (839 iters in 9.872)
    Ed25519: 7526.737 ops per second (61728 iters in 8.201)

  • [ecdh-secret]
    P224: 1900.241 ops per second (19076 iters in 10.039)
    P256: 1723.747 ops per second (17946 iters in 10.411)
    P384: 636.704 ops per second (6851 iters in 10.760)
    P521: 230.729 ops per second (2459 iters in 10.658)
    X25519: 13194.622 ops per second (130548 iters in 9.894)

  • [ecdh-share]
    P224: 1869.236 ops per second (19584 iters in 10.477)
    P256: 1702.685 ops per second (17699 iters in 10.395)
    P384: 615.655 ops per second (5303 iters in 8.614)
    P521: 246.461 ops per second (2536 iters in 10.290)
    X25519: 13019.884 ops per second (70323 iters in 5.401)

@hannesm
Copy link
Member

hannesm commented Feb 11, 2024

My gut feeling is that our Make_field_element should only expose field_element`, and do the allocations. If that turns out to be too expensive (i.e. better to re-use some field elements), we should find a better way... but this

let f = Fe.create () in
Fe.mul f (out_fe_to_fe f) (out_fe_to_fe f);

Where mul returns unit, and all arguments are the same piece of memory drives me crazy. More reasonable would be (inside Field_element):

let mul a b =
  let res = create () in
  mul res a b;
  out_fe_to_fe res

WDYT?

@hannesm
Copy link
Member

hannesm commented Feb 11, 2024

now there are more sane interfaces... benchmark for 1d81a2a

I've no clue about the variation / improvement in performance, maybe we should investigate?

  • [ecdsa-generate]
    P224: 2294.171 ops per second (23266 iters in 10.141)
    P256: 2026.484 ops per second (20669 iters in 10.199)
    P384: 755.492 ops per second (7756 iters in 10.266)
    P521: 285.283 ops per second (2899 iters in 10.162)
    Ed25519: 23014.022 ops per second (190114 iters in 8.261)

  • [ecdsa-sign]
    P224: 1938.880 ops per second (18925 iters in 9.761)
    P256: 1737.033 ops per second (17680 iters in 10.178)
    P384: 663.206 ops per second (6689 iters in 10.086)
    P521: 184.815 ops per second (1854 iters in 10.032)
    Ed25519: 10896.942 ops per second (96899 iters in 8.892)

  • [ecdsa-verify]
    P224: 710.888 ops per second (7076 iters in 9.954)
    P256: 655.990 ops per second (6420 iters in 9.787)
    P384: 239.028 ops per second (2449 iters in 10.246)
    P521: 93.360 ops per second (933 iters in 9.994)
    Ed25519: 8171.446 ops per second (72358 iters in 8.855)

  • [ecdh-secret]
    P224: 2080.923 ops per second (19215 iters in 9.234)
    P256: 1900.534 ops per second (18968 iters in 9.980)
    P384: 695.386 ops per second (7128 iters in 10.250)
    P521: 265.458 ops per second (2699 iters in 10.167)
    X25519: 14254.679 ops per second (144092 iters in 10.108)

  • [ecdh-share]
    P224: 2078.910 ops per second (19531 iters in 9.395)
    P256: 1901.205 ops per second (18463 iters in 9.711)
    P384: 698.909 ops per second (7274 iters in 10.408)
    P521: 272.144 ops per second (2800 iters in 10.289)
    X25519: 12961.836 ops per second (141643 iters in 10.928)

and with 28805b7

  • [ecdsa-generate]
    P224: 2369.614 ops per second (22716 iters in 9.586)
    P256: 2168.933 ops per second (21088 iters in 9.723)
    P384: 806.145 ops per second (7898 iters in 9.797)
    P521: 298.438 ops per second (3016 iters in 10.106)
    Ed25519: 23871.970 ops per second (208333 iters in 8.727)

  • [ecdsa-sign]
    P224: 1974.317 ops per second (19538 iters in 9.896)
    P256: 1805.985 ops per second (17946 iters in 9.937)
    P384: 682.772 ops per second (6913 iters in 10.125)
    P521: 195.217 ops per second (2031 iters in 10.404)
    Ed25519: 11560.283 ops per second (113122 iters in 9.785)

  • [ecdsa-verify]
    P224: 718.947 ops per second (7352 iters in 10.226)
    P256: 654.560 ops per second (6639 iters in 10.143)
    P384: 238.433 ops per second (2472 iters in 10.368)
    P521: 93.353 ops per second (984 iters in 10.541)
    Ed25519: 8314.278 ops per second (71225 iters in 8.567)

  • [ecdh-secret]
    P224: 2084.383 ops per second (19669 iters in 9.436)
    P256: 1906.947 ops per second (18946 iters in 9.935)
    P384: 701.574 ops per second (7166 iters in 10.214)
    P521: 266.106 ops per second (2584 iters in 9.710)
    X25519: 14418.451 ops per second (135135 iters in 9.372)

  • [ecdh-share]
    P224: 2019.919 ops per second (20161 iters in 9.981)
    P256: 1785.231 ops per second (19319 iters in 10.822)
    P384: 598.504 ops per second (7066 iters in 11.806)
    P521: 237.029 ops per second (2504 iters in 10.564)
    X25519: 13457.188 ops per second (139664 iters in 10.378)

@palainp
Copy link
Member

palainp commented Feb 11, 2024

Whoah thank you @hannesm for that huge amount of work here. I'm a bit late for the benchmark task, here are my results (current head / latest commit on this pr):

accel: XOR AES GHASH 	accel: XOR AES GHASH 
	
* [ecdsa-generate]	* [ecdsa-generate]
    P224:  833.833 ops per second (17661 iters in 21.180)	    P224:  1269.163 ops per second (12540 iters in 9.881)
    P256:  657.138 ops per second (6961 iters in 10.593)	    P256:  1187.079 ops per second (11803 iters in 9.943)
    P384:  261.465 ops per second (2673 iters in 10.223)	    P384:  354.774 ops per second (3545 iters in 9.992)
    P521:  103.006 ops per second (1031 iters in 10.009)	    P521:  122.708 ops per second (1228 iters in 10.007)
    Ed25519:  16956.513 ops per second (161290 iters in 9.512)	    Ed25519:  17064.384 ops per second (164473 iters in 9.638)
	
* [ecdsa-sign]	* [ecdsa-sign]
    P224:  661.410 ops per second (7065 iters in 10.682)	    P224:  1100.266 ops per second (10972 iters in 9.972)
    P256:  584.803 ops per second (6276 iters in 10.732)	    P256:  1021.587 ops per second (10208 iters in 9.992)
    P384:  236.447 ops per second (2458 iters in 10.396)	    P384:  320.053 ops per second (3193 iters in 9.976)
    P521:  83.519 ops per second (849 iters in 10.165)	    P521:  97.893 ops per second (963 iters in 9.837)
    Ed25519:  8372.911 ops per second (74962 iters in 8.953)	    Ed25519:  8376.877 ops per second (79491 iters in 9.489)
	
* [ecdsa-verify]	* [ecdsa-verify]
    P224:  228.349 ops per second (2319 iters in 10.156)	    P224:  405.440 ops per second (4029 iters in 9.937)
    P256:  205.493 ops per second (2024 iters in 9.849)	    P256:  378.991 ops per second (3773 iters in 9.955)
    P384:  82.141 ops per second (785 iters in 9.557)	    P384:  114.158 ops per second (1140 iters in 9.986)
    P521:  33.805 ops per second (337 iters in 9.969)	    P521:  40.311 ops per second (403 iters in 9.997)
    Ed25519:  5527.163 ops per second (56116 iters in 10.153)	    Ed25519:  5551.972 ops per second (54585 iters in 9.832)
	
* [ecdh-secret]	* [ecdh-secret]
    P224:  705.115 ops per second (7616 iters in 10.801)	    P224:  1188.758 ops per second (11901 iters in 10.011)
    P256:  622.850 ops per second (6560 iters in 10.532)	    P256:  1109.657 ops per second (11020 iters in 9.931)
    P384:  250.909 ops per second (2325 iters in 9.266)	    P384:  337.473 ops per second (3379 iters in 10.013)
    P521:  101.285 ops per second (988 iters in 9.755)	    P521:  118.583 ops per second (1185 iters in 9.993)
    X25519:  9195.340 ops per second (90252 iters in 9.815)	    X25519:  9332.203 ops per second (93984 iters in 10.071)
	
* [ecdh-share]	* [ecdh-share]
    P224:  699.653 ops per second (7411 iters in 10.592)	    P224:  1192.453 ops per second (11778 iters in 9.877)
    P256:  621.211 ops per second (6260 iters in 10.077)	    P256:  1110.044 ops per second (11069 iters in 9.972)
    P384:  247.632 ops per second (2247 iters in 9.074)	    P384:  337.365 ops per second (3365 iters in 9.974)
    P521:  101.020 ops per second (1014 iters in 10.038)	    P521:  119.869 ops per second (1195 iters in 9.969)
    X25519:  9277.979 ops per second (87565 iters in 9.438)	    X25519:  9400.095 ops per second (91407 iters in 9.724)

I'm less than 2x performance increase (1.5-2) with P* and almost no gain with 25519 ec (my CPU is a recent i7, but that was done in a Qubes AppVM, maybe I'm lacking the super asm instructions :) )

@hannesm
Copy link
Member

hannesm commented Feb 11, 2024

I'm done here. Feedback welcome. Plan to squash & merge on Tuesday Feb 13th.

Huge thanks to @dinosaure for initial work, @palainp for rebase & bugfixes, @reynir for review -- and everyone for offline discussions and comments that motivated me to put some more effort into this PR.

@Firobe
Copy link
Member

Firobe commented Feb 12, 2024

Thanks a lot for putting the work into this PR! (I do also get a factor 2 improvement on the HEAD commit compared to main).
I feel the internal interfaces are much saner now, with memory sharing/separation considerations clearer.

Copy link
Member

@palainp palainp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the huge amount of work you've done here! The quest for more performance will be for another PR here x2 is super already!
I've just noticed the miss of one (const WORD*)String_val.
So LGTM!

ec/native/np384_stubs.c Outdated Show resolved Hide resolved
@hannesm hannesm merged commit 36bc72f into mirage:main Feb 13, 2024
26 of 27 checks passed
hannesm added a commit to hannesm/opam-repository that referenced this pull request Feb 26, 2024
CHANGES:

* mirage-crypto, mirage-crypto-rng{,lwt,mirage}: support CL.EXE compiler
  (mirage/mirage-crypto#137 @jonahbeckford) - mirage-crypto-pk not yet due to gmp dependency,
  mirage-crypto-ec doesn't pass testsuite
* mirage-crypto-ec: use simpler square root for ed25519 - saving 3
  multiplications and 2 squarings, details
  https://mailarchive.ietf.org/arch/msg/cfrg/qlKpMBqxXZYmDpXXIx6LO3Oznv4/
  (mirage/mirage-crypto#196 @hannesm)
* mirage-crypto-ec: use sliding window method with pre-computed calues of
  multiples of the generator point for NIST curves, speedup around 4x for P-256
  sign (mirage/mirage-crypto#191 @Firobe, review @palainp @hannesm)
* mirage-crypto-ec: documentation: warn about power timing analysis on `k` in
  Dsa.sign (mirage/mirage-crypto#195 @hannesm, as proposed by @edwintorok)
* mirage-crypto-ec: replace internal Cstruct.t by string (speedup up to 2.5x)
  (mirage/mirage-crypto#146 @dinosaure @hannesm @reynir, review @Firobe @palainp @hannesm @reynir)
* bench/speed: add EC (ECDSA & EdDSA generate/sign/verify, ECDH secret/share)
  operations (mirage/mirage-crypto#192 @hannesm)
* mirage-crypto-rng: use rdtime instead of rdcycle on RISC-V (rdcycle is
  privileged since Linux kernel 6.6) (mirage/mirage-crypto#194 @AdrianBunk, review by @edwintorok)
* mirage-crypto-rng: support Loongarch (mirage/mirage-crypto#190 @fangyaling, review @loongson-zn)
* mirage-crypto-rng: support NetBSD (mirage/mirage-crypto#189 @drchrispinnock)
* mirage-crypto-rng: allocate less in Fortuna when feeding (mirage/mirage-crypto#188 @hannesm,
  reported by @palainp)
* mirage-crypto-ec: avoid mirage-crypto-pk and asn1-combinators test dependency
  (instead, craft our own asn.1 decoder -- mirage/mirage-crypto#200 @hannesm)

### Performance differences between v0.11.2 and v0.11.3 and OpenSSL

The overall result is promising: P-256 sign operation improved 9.4 times, but
is still a 4.9 times slower than OpenSSL.

Numbers in operations per second (apart from speedup, which is a factor
v0.11.3 / v0.11.2), gathered on a Intel i7-5600U CPU 2.60GHz using FreeBSD 14.0,
OCaml 4.14.1, and OpenSSL 3.0.12.

#### P224

| op     | v0.11.2 | v0.11.3 | speedup | OpenSSL |
|--------|---------|---------|---------|---------|
| gen    | 1160    | 20609   |    17.8 |         |
| sign   | 931     | 8169    |     8.8 | 21319   |
| verify | 328     | 1606    |     4.9 | 10719   |
| dh-sec | 1011    | 12595   |    12.5 |         |
| dh-kex | 992     | 2021    |     2.0 | 16691   |

#### P256

| op     | v0.11.2 | v0.11.3 | speedup | OpenSSL |
|--------|---------|---------|---------|---------|
| gen    | 990     | 19365   |    19.6 |         |
| sign   | 792     | 7436    |     9.4 | 36182   |
| verify | 303     | 1488    |     4.9 | 13383   |
| dh-sec | 875     | 11508   |    13.2 |         |
| dh-kex | 895     | 1861    |     2.1 | 17742   |

#### P384

| op     | v0.11.2 | v0.11.3 | speedup | OpenSSL |
|--------|---------|---------|---------|---------|
| gen    | 474     | 6703    |    14.1 |         |
| sign   | 349     | 3061    |     8.8 | 900     |
| verify | 147     | 544     |     3.7 | 1062    |
| dh-sec | 378     | 4405    |    11.7 |         |
| dh-kex | 433     | 673     |     1.6 | 973     |

#### P521

| op     | v0.11.2 | v0.11.3 | speedup | OpenSSL |
|--------|---------|---------|---------|---------|
| gen    | 185     | 1996    |    10.8 |         |
| sign   | 137     | 438     |     3.2 | 2737    |
| verify | 66      | 211     |     3.2 | 1354    |
| dh-sec | 180     | 1535    |     8.5 |         |
| dh-kex | 201     | 268     |     1.3 | 2207    |

#### 25519

| op     | v0.11.2 | v0.11.3 | speedup | OpenSSL |
|--------|---------|---------|---------|---------|
| gen    | 23271   | 22345   |     1.0 |         |
| sign   | 11228   | 10985   |     1.0 | 21794   |
| verify | 8149    | 8029    |     1.0 | 7729    |
| dh-sec | 14075   | 13968   |     1.0 |         |
| dh-kex | 13487   | 14079   |     1.0 | 24824   |
ansiwen pushed a commit to Nitrokey/nethsm-mirage-crypto that referenced this pull request May 6, 2024
Originally, we used Cstruct.t (bigarray) for interfacing. Instead, we use string now.
The benefit is that allocating a string is cheap, and in line with OCaml's GC.

After some years of stalling, we included benchmarks in bench/speed.ml fot the EC
operations in mirage#192  (sign, verify, generate for EC/EdDSA; and ECDH). The result for
thi change is a factor between 2 and 2.5.

The external API (mirage_crypto_ec.mli) does not change at all. There are various
other cleanups in the code, such as providing a layer to isolate the C calls (which
receive a bytes buffer for the result value, and thus mutate this buffer) to be
immutable.

Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Hannes Mehnert <[email protected]>
Co-authored-by: Reynir Björnsson <[email protected]>
Reviewed-by: Virgile Robles <[email protected]>
Reviewed-by: Pierre Alain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants