Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import mlkem-native #2176

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

hanno-becker
Copy link
Contributor

@hanno-becker hanno-becker commented Feb 7, 2025

Import mlkem-native

This imports mlkem-native (https://github.com/pq-code-package/mlkem-native, maintained by myself and @mkannwischer) into AWS-LC, replacing the reference implementation.

This PR focuses on the minimal configuration of mlkem-native: No assembly and no FIPS-202 code are imported.

mlkem-native is a high-performance, high-assurance C90 implementation of ML-KEM developed under the Post-Quantum Cryptography Alliance (PQCA) and the Linux Foundation. It is a fork of the reference implementation that AWS-LC previously relied on, and remains close to it. mlkem-native is the default ML-KEM implementation in libOQS.

Goal

The goal is for this PR to be a driver for discussion and flushing out of technical issues, and that it be merged as a first step towards integrating mlkem-native, followed by the integration of (hopefully then-verified) native backends in AArch64 and AVX2 later in 2025.

Todos

  • Make CI happy
  • Pairwise Consistency Test (PCT)
  • Stack zeroization (in progress Add option for destruction of intermediate values pq-code-package/mlkem-native#763)
  • Consider addition of a compatibility layer mapping OpenSSL helper functions/macros to mlkem-native
  • Document why the FIPS-202 glue code can ignore return values (cc @manastasova)
  • Re-introduce self-tests in top-level APIs in ml_kem.c
  • Understand how namespacing in AWS-LC works and whether any adjustments are needed

Import Mechanism

The mlkem-native source code is unmodified and imported using the importer script crypto/fipsmodule/ml_kem/importer.sh; the details of the import are in META.yml.

Future updates to the source tree would ideally happen through a re-import of a different version of mlkem-native, though a temporary change-log is conceivable, similar to how the changes from the reference implementation were documented so far.

Import Scope

mlkem-native has a C-only version as well as native 'backends' in AVX2 and Neon for high performance. This commit only imports the C-only version. Integration of native backends will be done separately.

mlkem-native offers its own FIPS-202 implementation, including fast versions of batched FIPS-202. However, this commit does not import those, but instead provides glue-code around AWS-LC's own FIPS-202 implementation. The path to leveraging the FIPS-202 performance improvements in mlkem-native would be to integrate them directly
into crypto/fipsmodule/sha.

Side-channels

mlkem-native's CI uses a patched version of valgrind to check for various compilers and compile flags that there are no secret-dependent memory accesses, branches, or divisions. The relevant assertions have been kept but are unused unless ENABLE_CT_TESTING is set.

Similar to AWS-LC, mlkem-native uses value barriers to block potentially harmful compiler reasoning and optimization. Where standard gcc/clang inline assembly is not available, mlkem-native falls back to a slower 'opt blocker' based on a volatile global (an idea we picked up from DjB) -- both is described in verify.h. It will be interesting to see if the opt-blocker variant works on all platforms that AWS-LC cares about.

Formal Verification

All C-code imported in this commit is formally verified using the C Bounded Model Checker (CBMC) to be free of various classes of undefined behaviour, including out-of-bounds memory accesses and arithmetic overflow; the latter is of particular interest for ML-KEM because of the use of lazy modular reduction for improved performance.

The heart of the CBMC proofs are function contract and loop annotations to the C-code. Function contracts are denoted __contract__(...) clauses and occur at the time of declaration, while loop contracts are denoted __loop__ and follow the for statement.

The function contract and loop statements are kept in the source, but removed by the preprocessor so long as the CBMC macro is undefined. Keeping them simplifies the import, and care has been taken to make them readable to the non-expert, and thereby serve as precise documentation of assumptions and guarantees upheld by the code.

The CBMC proofs are automatic and don't require further proofs scripts; yet, they come with their own build system and toolchain dependencies, which this commit does not attempt to import. See proofs/cbmc in the mlkem-native repository. Mid-term, however, CI infrastructure should be setup that allows to import and check the CBMC proofs as part of the AWS-LC CI.

Performance

It is expected -- but should be checked! -- that the ML-KEM performance with this PR is comparable to that of the reference implementation. This is because the mlkem-native's fast backends are not yet imported, the FIPS-202 code remains that of AWS-LC, and mlkem-native is otherwise close to the reference implementation.

Technical aspects

At the core, mlkem-native is currently a 'single-level' implementation of ML-KEM: A build of the main source tree provides an implementation of exactly one of ML-KEM-512/768/1024, depending on the MLKEM_K parameter. This property is inherited from the ML-KEM reference implementation, while AWS-LC's fork of the reference implementation has changed this behaviour and passes the security level as a runtime parameter.

To build all security levels, level-specific sources are built 3 times, once per security level, and linked with a single build of the level-independent code. The single-compilation-unit approach pursued by AWS-LC makes this process fairly simple since one merely needs to include the single-compilation-unit file provided by mlkem-native three times, and configure it so that the level-independent code is included only once. The final include moreover #undef'ines all macros defined by mlkem-native, reducing the risk of name clashes with other parts of crypto/fipsmodule/bcm.c.

This commit removes the reference implementation of ML-KEM from
the source tree, in preparation for the integration of mlkem-native.

Signed-off-by: Hanno Becker <[email protected]>
This imports mlkem-native (https://github.com/pq-code-package/mlkem-native)
as the ML-KEM implementation used in AWS-LC, replacing the reference
implementation.

mlkem-native is a high-performance, high-assurance C90 implementation
of ML-KEM developed under the Post-Quantum Cryptography Alliance (PQCA)
and the Linux Foundation. It is a fork of the reference implementation
that AWS-LC previously relied on, and remains close to it. mlkem-native
is the default ML-KEM implementation in
[libOQS](https://github.com/open-quantum-safe/liboqs).

**Import Mechanism**

The mlkem-native source code is unmodified and imported using the
importer script crypto/fipsmodule/ml_kem/importer.sh; the details
of the import are in META.yml.

Future updates to the source tree would ideally happen through
a re-import of a different version of mlkem-native, though a temporary
change-log is conceivable, similar to how the changes from the
reference implementation were documented so far.

**Import Scope**

mlkem-native has a C-only version as well as native 'backends' in
AVX2 and Neon for high performance. This commit only imports the
C-only version. Integration of native backends will be done separately.

mlkem-native offers its own FIPS-202 implementation, including
fast versions of batched FIPS-202. However, this commit does not
import those, but instead provides glue-code around AWS-LC's own
FIPS-202 implementation. The path to leveraging the FIPS-202 performance
improvements in mlkem-native would be to integrate them directly
into [crypto/fipsmodule/sha](crytpo/fipsmodule/sha).

**Formal Verification**

All C-code imported in this commit is formally verified using the
C Bounded Model Checker ([CBMC](https://github.com/diffblue/cbmc/))
to be free of various classes of undefined behaviour, including
out-of-bounds memory accesses and arithmetic overflow; the latter
is of particular interest for ML-KEM because of the use of lazy
modular reduction for improved performance.

The heart of the CBMC proofs are function contract and loop annotations to
the C-code. Function contracts are denoted `__contract__(...)` clauses
and occur at the time of declaration, while loop contracts are denoted
`__loop__` and follow the `for` statement.

The function contract and loop statements are kept in the source,
but removed by the preprocessor so long as the CBMC macro is undefined.
Keeping them simplifies the import, and care has been taken to make
them readable to the non-expert, and thereby serve as precise
documentation of assumptions and guarantees upheld by the code.

The CBMC proofs are automatic and don't require further proofs scripts;
yet, they come with their own build system and toolchain dependencies,
which this commit does not attempt to import. See
[proofs/cbmc](https://github.com/pq-code-package/mlkem-native/tree/main/proofs/cbmc)
in the mlkem-native repository. Mid-term, however, CI infrastructure
should be setup that allows to import and check the CBMC proofs
as part of the AWS-LC CI.

**Performance**

It is expected that the ML-KEM performance with this commit
is comparable to that of the reference implementation. This is
because the mlkem-native's fast backends are not yet imported,
the FIPS-202 code remains that of AWS-LC, and mlkem-native is
otherwise close to the reference implementation.

**Technical aspects**

At the core, mlkem-native is currently a 'single-level' implementation of
ML-KEM: A build of the main source tree provides an implementation of exactly
one of ML-KEM-512/768/1024, depending on the MLKEM_K parameter.
This property is inherited from the ML-KEM reference implementation, while
AWS-LC's fork of the reference implementation has changed this behaviour
and passes the security level as a runtime parameter.

To build all security levels, level-specific sources are built 3 times, once
per security level, and linked with a single build of the level-independent
code. The single-compilation-unit approach pursued by AWS-LC makes this process
fairly simple since one merely needs to include the single-compilation-unit file
provided by mlkem-native three times, and configure it so that the level-independent
code is included only once. The final include moreover `#undef`'ines all macros
defined by mlkem-native, reducing the risk of name clashes with other parts of
crypto/fipsmodule/bcm.c.

**Outstanding**

mlkem-native does not yet offer the following two FIPS-related features:
- Pairwise Consistency Test (PCT)
- Stack zeroization.
Those will be added to mlkem-native shortly and the import updated.

Signed-off-by: Hanno Becker <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Feb 7, 2025

Codecov Report

Attention: Patch coverage is 98.85714% with 8 lines in your changes missing coverage. Please review.

Project coverage is 79.00%. Comparing base (f407534) to head (df5b090).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
crypto/fipsmodule/ml_kem/ml_kem.c 75.00% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2176      +/-   ##
==========================================
+ Coverage   78.98%   79.00%   +0.01%     
==========================================
  Files         611      613       +2     
  Lines      105904   105984      +80     
  Branches    14982    14978       -4     
==========================================
+ Hits        83651    83735      +84     
+ Misses      21598    21596       -2     
+ Partials      655      653       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hanno-becker hanno-becker marked this pull request as ready for review February 7, 2025 16:37
@hanno-becker hanno-becker requested a review from a team as a code owner February 7, 2025 16:37
@torben-hansen torben-hansen self-requested a review February 7, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants