-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-37982: Bug fix: Reduce Frequency of Update Requests for Copied CSVs #3497
base: master
Are you sure you want to change the base?
Conversation
01343d6
to
67c82b9
Compare
by adding annotations to copied CSVs that are populated with hashes of the non-status fields and the status fields. This seems to be how this was intended to work, but was not actually working this way because the annotations never actually existed on the copied CSV. This resulted in a hot loop of update requests being made on all copied CSVs. Signed-off-by: everettraven <[email protected]>
Signed-off-by: everettraven <[email protected]>
Signed-off-by: everettraven <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Code Changes: • Annotation Consistency: We unconditionally set the non-status-hash annotation on the in-memory CSV object, ensuring that prototype.Annotations["olm.operatorframework.io/nonStatusCopyHash"] always matches the final state—even if the existing CSV already matched. • Multi-step Updates: We now issue a separate “normal” Update call after an UpdateStatus call if the CSV’s status hash differs. This keeps the statusCopyHashAnnotation in sync with the actual .status and avoids stale annotation data. Test Changes: • Expected Actions: Each test case now expects the exact create/update/updateStatus calls (and any subsequent update) that the refactored copyToNamespace emits. This includes: 1. Creating a CSV if none exists, 2. Updating the non-status annotation if it changed, 3. Updating the .status subresource if the status hash changed, and 4. Issuing a follow-up metadata update for the new status-hash annotation. • Fake Lister: • Even though the code already called copiedCSVLister.Namespace(ns), our old tests didn’t exercise or strictly verify that part of the interface. As we expanded and refined the tests — particularly around existing vs. non-existing CSV scenarios — we triggered code paths that call .Namespace(ns).Get(...), exposing the incomplete fake. • Better Coverage: By adding a fully implemented fake lister (including List, Get, and Namespace(...)), the new tests accurately reflect the real OLM flow and properly simulate how the operator queries for existing CSVs in a specific namespace. Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
4bbdc20
to
4b2c88b
Compare
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
} else { | ||
// Even if they're the same, ensure the returned prototype is annotated. | ||
prototype.Annotations[statusCopyHashAnnotation] = status | ||
updated = prototype | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the code implemented in this PR to the current state, the main addition seems to be this else block (beyond tests).
I’m not entirely sure I fully understand—are we also looking to implement what’s outlined in the Proposed Fixes
section of this document? How/where are we addressing the concerns raised in the: Why don’t we just merge the [fix PR](https://github.com/operator-framework/operator-lifecycle-manager/pull/3411) as-is?
section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is first pass, basically, just merge the old PR. With this PR we're taking path of #4
in the scoping doc: merge the PR with some possible problems, they should be a minor use case: users changing the copied CSVs
but the else is not the only thing done here, the main thing added is the tracking hashes so we can tell what's in need of update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be some simplification that can be done with the setting of the status/nonstatus annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming all the changes here are due to lint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, and I just ran make lint
locally to make sure nothing changed. nothing changed.
@@ -803,6 +808,7 @@ func (a *Operator) copyToNamespace(prototype *v1alpha1.ClusterServiceVersion, ns | |||
|
|||
existing, err := a.copiedCSVLister.Namespace(nsTo).Get(prototype.GetName()) | |||
if apierrors.IsNotFound(err) { | |||
prototype.Annotations[nonStatusCopyHashAnnotation] = nonstatus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because copyToNamespace
is called in a loop, prototype
, being a pointer, is reused multiple times. Which means that these annotations may already be set. Is there any reason why these annotations simply aren't set in ensureCSVsInNamesapces()
where the hashes are calculcated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point possibly. checking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So looking at it closer it seems like we shouldn't change it, here's my reasoning:
Keeping the annotation logic here, in copyToNamespace()
, encapsulate the update semantics so each call handles its own CSV's state reliably.
We're reusing prototype and accounting for possibly set annotations. If we move the logic to ensureCSVsInNamesapces()
, we'll have to duplicate the annotation checking logic because the logic for handling those annotations is tightly coupled with the CSV’s create/update lifecycle.
In copyToNamespace()
we need to:
• Distinguish between a new creation (where the annotations don’t exist yet) and an update (where the annotations might already be set but could be outdated).
• Apply the updates in a specific order (first updating the non-status hash, then the status hash, including a status update to avoid mismatches).
• Ensure that each target CSV reflects the current state as expected for that specific namespace.
Aside from the hash handling we'd still need to be doing the above work in copyToNamespace()
Description of the change:
Please checkout this doc on scoping out this change: https://docs.google.com/document/d/1P4cSYEP05vDyuhBfilyuWgL5d5OOD5z7JlAyOxpPqps
In this PR we are resurrecting #3411 with the intent to fix what that PR was originally going to fix. Follow on work will address the then revealed problem with
metadata.generation|resourceVersion
as per the doc comment by @tmshortMotivation for the change:
[from the linked doc, "How Did We Get Here:]
Architectural changes:
Testing remarks:
Except from the expected changes around the inability to track copied CSV changes made by a user, we should be careful to test the following:
Reviewer Checklist
/doc
[FLAKE]
are truly flaky and have an issue