Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage due to large number of VMRule #1245

Open
lujiajing1126 opened this issue Feb 25, 2025 · 2 comments
Open

High CPU usage due to large number of VMRule #1245

lujiajing1126 opened this issue Feb 25, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@lujiajing1126
Copy link
Contributor

lujiajing1126 commented Feb 25, 2025

I am using VM Operator v0.53.0, and we have a single VMAlert and about ~2k VMRule selected by this instance.

We are now facing high CPU usage,

Image

We investigate the issue with pprof,

Image

The hot code path is shown above which may be introduced in 9d91c22. As I understand we have to reconcile (select) all VMAlert and then all selected VMRules (~ 2k in our case) for every single VMRule.

@f41gh7
Copy link
Collaborator

f41gh7 commented Feb 25, 2025

It looks like, most of the CPU time is spent on VMRule.Validate function call -

func (r *VMRule) Validate() error {

It could by-passed, if VMRule has annotation:

kind: VMRule
metadata:
 annotations:
   "operator.victoriametrics.com/skip-validation": "true"

Operator tries to generate valid configuration file for vmalert and skips invalid VMRule objects.

Maybe, it's better to skip all runtime syntax checks and enforce it only with Validation webhooks. Or at least make it optional.

@f41gh7 f41gh7 added the enhancement New feature or request label Feb 25, 2025
@ebensom
Copy link
Contributor

ebensom commented Feb 26, 2025

@lujiajing1126 just another idea to mitigate: you may introduce VMRule sharding by deploying multiple vmalerts having selectors for distinct set of VMRules. For example, you may introduce 4 vmalerts, each having different selectors for rulesets:

ruleset: alerts-1
ruleset: alerts-2
ruleset: alerts-3
ruleset: alerts-4

And then you would apply an uniform (or random) distribution of the possible labels across your 2000+ VMRules.

@f41gh7 f41gh7 self-assigned this Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants