Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Degradation with FIPS Stemcell #379

Open
maxmoehl opened this issue Aug 26, 2024 · 1 comment
Open

Performance Degradation with FIPS Stemcell #379

maxmoehl opened this issue Aug 26, 2024 · 1 comment

Comments

@maxmoehl
Copy link
Member

Original Issue Description (slightly modified from Slack)

We've recently started to adopt the FIPS stemcell in some of our environments. When we reached the first environment which had a slightly elevated background load we immediately ran into issues with HAProxy and openssl. We can see that openssl on ubuntu uses an interesting [1,2] way to generate random numbers which boils down to performing expensive checksum calculations over and over again to get some jitter in the execution time and use that as true entropy for generating random numbers. Since our HAProxy uses the openssl shipped with the OS and does TLS handshakes with clients and gorouter it requires randomness (and therefore entropy), the CPU utilisation jumped up and HAProxy started running into timeouts because it was unable to generate enough entropy with the limited CPU it has. Our rough estimates are that we would need 10x as much resources when enabling FIPS. We've written down our observations in an upstream issue [3] and would like to know whether any of you encountered similar issues and what workarounds are available. We already know of AWS CloudHSM which could be configured to take over the RNG tasks but this is also additional integration effort and the setup as a whole would probably need certification again.

[1] https://www.chronox.de/jent/
[2] https://csrc.nist.gov/CSRC/media/projects/cryptographic-module-validation-program/documents/entropy/E48_PublicUse.pdf
[3] haproxy/haproxy#2588

Details we have discovered in the meantime

We started drilling down on HAProxy and openssl and noticed that the FIPS version of openssl calls getrandom(2) a lot. getrandom(2) on the other hand seems to be limited to 150 calls/s and as a result in our experiments we were only able to accept ~12 TLS connections per second from clients. This results in one saturated core at most. We have consulted with our local crypto experts and their suspicion is that the random number generator is re-seeded way too often causing this bottle-neck as the entropy that can be gained is limited. Since the limit is per-machine the current workaround on our side is to use 2-core VMs, but loads of them, still the per-core conn/s is down from ~125 to ~12 (and this is with considering the 2-core VMs as single-core).

We are in contact with Canonical to understand the behaviour better. I'd be interested to look at the sources but it seems like the kernel source code of the FIPS version is missing. On non-FIPS I can obtain the source by installing linux-source but for FIPS the package linux-fips-source-5.15.0 is missing although its referenced by some other packages:

$ sudo apt install linux-fips-source-5.15.0
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package linux-fips-source-5.15.0 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'linux-fips-source-5.15.0' has no installation candidate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting for Changes | Open for Contribution
Development

No branches or pull requests

3 participants
@maxmoehl and others