Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NO-JIRA: Drop TuneD profiles with the same name and different content #1216

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jmencak
Copy link
Contributor

@jmencak jmencak commented Nov 14, 2024

It is possible to create Tuned CRs with TuneD profiles of the same name.
This is problematic when the duplicate TuneD profiles have different
content. This causes periodic extraction of TuneD profiles and can lead
to TuneD restarts.

This change drops all instances of duplicate TuneD profiles and is a partial
fix for OCPBUGS-44559.

Copy link
Contributor

openshift-ci bot commented Nov 14, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmencak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2024
MCLabels map[string]string
NodePoolName string
Operand tunedv1.OperandConfig
TuneDProfileConflict []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is storing this really necessary? The AllProfiles list already has this information and it is enough to just report the duplicate names in the status I'd say?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd have to go through the AllProfiles list again when calculating the status and check their content. Same content is not a problem and I don't think we should make co/node-tuning degraded. The "cache" is an optimization. Data vs. CPU cycles. I'm for saving CPU cycles.

@jmencak
Copy link
Contributor Author

jmencak commented Nov 14, 2024

An example of creating duplicate profiles with conflicting content:
Profile 1

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-profile-dup
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Custom OpenShift profile
      include=openshift-node
      [sysctl]
      kernel.shmmni=8192
    name: openshift-profile
  recommend:
  - match:
    - label: profile
    priority: 20
    profile: openshift-profile

Profile 2

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-profile-dup2
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Custom OpenShift profile
      include=openshift-node
      [sysctl]
      kernel.shmmni=8191
    name: openshift-profile
  recommend:
  - match:
    - label: profile
    priority: 20
    profile: openshift-profile

@@ -308,6 +308,13 @@ func (c *Controller) computeStatus(tuned *tunedv1.Tuned, conditions []configv1.C
availableCondition.Message = fmt.Sprintf("%v/%v Profiles failed to be applied", numDegradedProfiles, len(profileList))
}

if len(c.tunedProfileConflict) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is cheaper than doing the count over AllProfiles here, but using the direct value instead of a precomputed cache might be easier to maintain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I said above, it is cheaper.

@jmencak jmencak changed the title Set co/node-tuning Degraded when conflicting TuneD profiles exist WiP: Set co/node-tuning Degraded when conflicting TuneD profiles exist Nov 14, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2024
@jmencak jmencak force-pushed the 4.18-duplicate-profiles-diff-content branch 2 times, most recently from 8658666 to fad99f9 Compare November 14, 2024 14:42
@jmencak jmencak changed the title WiP: Set co/node-tuning Degraded when conflicting TuneD profiles exist Drop TuneD profiles with the same name and different content Nov 14, 2024
@jmencak jmencak changed the title Drop TuneD profiles with the same name and different content NO-JIRA: Drop TuneD profiles with the same name and different content Nov 14, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 14, 2024
@openshift-ci-robot
Copy link
Contributor

@jmencak: This pull request explicitly references no jira issue.

In response to this:

It is possible to create Tuned CRs with TuneD profiles of the same name. This is problematic when the duplicate TuneD profiles have different content. While the problem is obvious when inspecting the operator logs, this configuration issue is serious enough it should be more visible.

This change makes the node-tuning ClusterOperator object Degraded when conflicting TuneD profiles are created. This in turn creates an alert for cluster administrators to deal with this misconfiguration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2024
@jmencak
Copy link
Contributor Author

jmencak commented Nov 14, 2024

/hold
Unfortunately, as is, this will not fix the issue, not even partly. The Tuned CRs can be retrieved in random order.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 14, 2024
It is possible to create Tuned CRs with TuneD profiles of the same name.
This is problematic when the duplicate TuneD profiles have different
content.  This causes periodic extraction of TuneD profiles and can lead
to TuneD restarts.

This change drops all instances of duplicate TuneD profiles and is a
partial fix for OCPBUGS-44559.
@jmencak jmencak force-pushed the 4.18-duplicate-profiles-diff-content branch from fad99f9 to 47bacc5 Compare November 14, 2024 16:50
@jmencak
Copy link
Contributor Author

jmencak commented Nov 14, 2024

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 14, 2024
Copy link
Contributor

openshift-ci bot commented Nov 14, 2024

@jmencak: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants