Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(manifests): disable --auto-gomemlimit for Prometheus on SNO unt… #2549

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

machine424
Copy link
Contributor

@machine424 machine424 commented Jan 6, 2025

…il we can ensure it won't result in excessive CPU usage

requires openshift/prometheus#227

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

…il we can ensure it won't result in excessive CPU usage
@openshift-ci openshift-ci bot requested review from marioferh and rexagod January 6, 2025 13:24
Copy link
Contributor

openshift-ci bot commented Jan 6, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: machine424

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 6, 2025
@machine424
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 6, 2025
Copy link
Contributor

openshift-ci bot commented Jan 6, 2025

@machine424: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift-conformance 8e147f7 link true /test e2e-hypershift-conformance
ci/prow/versions 8e147f7 link false /test versions
ci/prow/e2e-aws-ovn-single-node 8e147f7 link false /test e2e-aws-ovn-single-node
ci/prow/okd-scos-e2e-aws-ovn 8e147f7 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@rexagod rexagod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no ticket linked to this, I was wondering if we saw any instances of 10% memory reduction bottleneck-ing the CPU on SNO?

@@ -1491,7 +1491,7 @@ func (f *Factory) PrometheusK8s(grpcTLS *v1.Secret, telemetrySecret *v1.Secret)
return p, nil
}

func (f *Factory) setupGoGC(p *monv1.Prometheus) {
func (f *Factory) adjustGoGCConfig(p *monv1.Prometheus) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like:

Suggested change
func (f *Factory) adjustGoGCConfig(p *monv1.Prometheus) {
func (f *Factory) adjustGoSettings(p *monv1.Prometheus) {

Since this affects the GOMEMLIMIT too now.

for _, env := range c.Env {
require.NotEqual(t, env.Name, "GOGC")
}
return
}

require.Contains(t, c.Env, v1.EnvVar{Name: "GOGC", Value: tc.exp})
require.Contains(t, c.Env, v1.EnvVar{Name: "GOGC", Value: tc.expectedGOGC})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+100!

require.Contains(t, c.Env, v1.EnvVar{Name: "GOGC", Value: tc.exp})
require.Contains(t, c.Env, v1.EnvVar{Name: "GOGC", Value: tc.expectedGOGC})

require.Equal(t, tc.autoGOMEMLIMITDisabled, argumentPresent(*c, "--no-auto-gomemlimit"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could drop the tc.autoGOMEMLIMITDisabled field as this could be safely derived from ir.HighlyAvailableInfrastructure(), as that's the only case where this is disabled for now (else enabled)?

Suggested change
require.Equal(t, tc.autoGOMEMLIMITDisabled, argumentPresent(*c, "--no-auto-gomemlimit"))
require.Equal(t, tc.ir.HighlyAvailableInfrastructure(), argumentPresent(*c, "--no-auto-gomemlimit"))

@rexagod
Copy link
Member

rexagod commented Jan 7, 2025

I'm asking #2549 (review) as any observed insight should help me set a more meaningful buffer threshold in kubernetes-monitoring/kubernetes-mixin#1010 (comment).

@machine424 machine424 marked this pull request as draft January 7, 2025 10:45
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 7, 2025
@machine424
Copy link
Contributor Author

Thanks for the review, this is still WIP actually, requires openshift/prometheus#227. I've marked it as such. I'll get back to you later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants