-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage after upgrade to version 1.15.5 #26036
Comments
The same behaviour is observable in version 1.15.6 |
we got the problem too. Did you find a reason? We moved from 1.12 to 1.16 |
Unfortunately, no. We are sticking with version 1.15.4. We tested all subsequent versions (1.15.5 and later, including 1.16.x) and observed the same behavior. |
Thank you for testing this on 1.16 as well. I'll bring it up to our engineers. :) |
Weird stuff: we rotate the transit key, and it solved the issue. We don't understand what could be the difference, as the old & the new keys are both working. Just the old one is causing high CPU usage |
yeah this is really strange behavior for sure, but that's pretty good news and something we will test and report back on |
Yesterday, we performed transit key rotation on all our transit secret engines. Subsequently, we upgraded to the latest version of Vault (1.17.0) and initiated our standard load testing. Unfortunately, we encountered significant performance degradation, which we had previously reported. Specifically:
Interestingly, reverting to Vault 1.15.4 resolved the issue entirely. With this version, performance is optimal, reaching up to 50-60% CPU load at 1000 RPS. We are keen to understand why this performance discrepancy exists since versions 1.15.5 and 1.17.0. Any insights would be greatly appreciated. |
May you rotate your key again to see if it fixes the problem ? That's how we solved it |
not other info? We're about to rotate our key to solve the problem, but that's a pretty odd solution, without clear reason on the root cause |
Out of interest, did you rotate your transit keys while running on the latest version of Vault or did you complete the transit key rotation using a specific version of Vault and then upgrading to the latest version? Please provide more information in terms of what worked for you (in order for us to test if we can replicate with same success as you've reported). |
We upgraded first. Then we realised that there was an issue, and decided to rotate the keys (still on the newest version). Then the problem was solved |
Hello all, I've been trying to reproduce this issue without success unfortunately. Would it be possible to provide additional information such
Please note that within 1.15.6 a locking issue was resolved within #25336 but that doesn't sound like issue you are reporting (and should have been resolved in the later versions you have tested.) Thanks! |
Hello, this is setup on our side:
|
After audit devices were disabled, we managed to reach 800+ RPS, so it seems the audit is a culprit of high CPU usage after upgrade to version 1.15.5. |
Thank you for that! Very helpful to know. I've taken it back to our engineers and they're brainstorming about possible culprits. |
Thank you so much! Do you have any logs that we could use to help narrow down the audit issue even further? |
This issue might be related : #28170 |
Describe the bug
After upgrading from Vault version 1.15.4 to 1.15.5, there is high CPU usage on Vault servers when transit operations are called, even with a relatively small number of requests per second (RPS), causing CPU core usage to reach 100%.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
After upgrading from Vault version 1.15.4 to 1.15.5, the CPU usage during transit operations should remain within acceptable limits. Specifically, the CPU core usage should not spike to 100% under small RPS.
Environment:
vault status
): Vault v1.15.5vault version
): Vault v1.15.5 (0d8b67e), built 2024-01-26T14:53:40ZVault server configuration file(s):
Additional context
Vault telemetry for version 1.15.5 with max. 300 RPS to transit backends during 5 minutes testing timeframe
CPU usage
Transit usage
Vault telemetry for version 1.15.4 with max. 2000 RPS to transit backends during 45 minutes testing timeframe
CPU usage
Transit usage
The text was updated successfully, but these errors were encountered: