Skip to content

OCPBUGS-57348: release_notes/ocp-4-19-release-notes: Add a boot-image-clobber known issue #94987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: enterprise-4.19
Choose a base branch
from

Conversation

wking
Copy link
Member

@wking wking commented Jun 18, 2025

OCPBUGS-57348 is the 4.20 bug, and we don't have a 4.19.z clone yet, but we can update this entry once that 4.19.z OCPBUGS-... exists.

Version(s): 4.19

Issue: OCPBUGS-57348

Docs preview.

QE review:

  • QE has approved this change.

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Jun 18, 2025
@openshift-ci-robot
Copy link

@wking: This pull request references Jira Issue OCPBUGS-57348, which is invalid:

  • expected Jira Issue OCPBUGS-57348 to depend on a bug in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

OCPBUGS-57348 is the 4.20 bug, and we don't have a 4.19.z clone yet, but we can update this entry once that 4.19.z OCPBUGS-... exists.

Version(s): 4.19

Issue: [OCPBUGS-57348](https://issues.redhat.com/browse/OCPBUGS-57348

Link to docs preview:

QE review:

  • QE has approved this change.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jun 18, 2025
@openshift-ci openshift-ci bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 18, 2025
@wking
Copy link
Member Author

wking commented Jun 18, 2025

@openshift/team-documentation, I'm not sure what the add-a-known-issue process is; can you help me get this ingested?

@ocpdocs-previewbot
Copy link

ocpdocs-previewbot commented Jun 18, 2025

🤖 Tue Jun 24 23:07:26 - Prow CI generated the docs preview:

https://94987--ocpdocs-pr.netlify.app/openshift-enterprise/latest/release_notes/ocp-4-19-release-notes.html

@wking wking force-pushed the boot-image-clobber-known-issue branch from df43ab0 to 72d6586 Compare June 18, 2025 23:27
@@ -2741,6 +2741,8 @@ In the following tables, features are marked with the following statuses:

* In {product-title} {product-version}, clusters using IPsec for network encryption might experience intermittent loss of pod-to-pod connectivity. This prevents some pods on certain nodes from reaching services on other nodes, resulting in connection timeouts. Internal testing could not reproduce this issue on clusters with 120 nodes or less. There is no workaround for this issue. (link:https://issues.redhat.com/browse/OCPBUGS-55453[OCPBUGS-55453])

* {product-title} clusters that are installed on {aws-short} with custom AMIs or on {gcp-short} with custom disk images will have those customizations overridden by boot image management. To recover, you must xref:../machine_configuration/mco-update-boot-images.adoc#mco-update-boot-images-disable_machine-configs-configure[disable boot image management], restore your MachineSet boot images, and delete any Machines created with an undesired boot image.(link:https://issues.redhat.com/browse/OCPBUGS-57348[OCPBUGS-57348])
Copy link
Contributor

@dfitzmau dfitzmau Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* {product-title} clusters that are installed on {aws-short} with custom AMIs or on {gcp-short} with custom disk images will have those customizations overridden by boot image management. To recover, you must xref:../machine_configuration/mco-update-boot-images.adoc#mco-update-boot-images-disable_machine-configs-configure[disable boot image management], restore your MachineSet boot images, and delete any Machines created with an undesired boot image.(link:https://issues.redhat.com/browse/OCPBUGS-57348[OCPBUGS-57348])
* If you install a cluster on {aws-short} that has Amazon Machine Images (AMI) enabled or on {gcp-short} that has custom disk images enabled, the boot image management overrides these customization images with boot images. As a workaround, you can disable the boot image management feature, restore the boot images for the machine sets to their original location, and delete any machines that were incorrectly generated by the overriding boot images. To disable boot image management, see ref:../machine_configuration/mco-update-boot-images.adoc#mco-update-boot-images-disable_machine-configs-configure[disable boot image management]. (link:https://issues.redhat.com/browse/OCPBUGS-57348[OCPBUGS-57348])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns with that suggestion, but in the interest of getting something declared as a known-issue, I've adopted your wording (and pivoted from the 4.20 OCPBUGS-57348 to the 4.19.z OCPBUGS-57796, now that that backport tracker exists) with 72d6586 -> cf256d9. See the cf256d9 commit message for more on my concerns. I'll leave it up to docs folks if any of them are worth follow-up pull requests.

/label merge-review-needed

@dfitzmau dfitzmau added the peer-review-done Signifies that the peer review team has reviewed this PR label Jun 19, 2025
@dfitzmau
Copy link
Contributor

Thanks for raising this PR, @wking . I added a comment that revised the note.

@dfitzmau
Copy link
Contributor

When incorporated exactly as is or some variant, you can add a /label merge-review-needed comment to this PR, to place the PR in the merge queue.

wking added 2 commits June 24, 2025 15:40
…obber rewording

Pulling in the suggestion from [1], word for word (although I am
bumping the bug link to OCPBUGS-57796, now that that 4.19.z backport
tracker exists), to try to get something declared.  We can wordsmith
later if we want.  Personally, I have concerns, including:

> ... the boot image management overrides these customization images
> with boot images.

but "with boot images" doesn't make sense to me, because MachineSets
are going to reference boot images regardless; there's no "without
boot images" option.  The distinction is that sometimes the boot
images are specifically selected by the cluster admin, and the MCO's
boot image management would override those cluster-admin preferences.

> ...restore the boot images for the machine sets to their original
> location...

I'd prefer "previous value" or something to "original location".  I
haven't heard folks say "location" for an AMI ID or other MachineSet
property value, while I have heard "value" for that.  And it seems
like folks might confuse "original" as "what the MachineSet used when
it was created" when what we mean was "what the MachineSet used just
before the MCO clobbered its boot image value".

> ...delete any machines that were incorrectly generated by the
> overriding boot images.

I'd prefer "overriden boot images" or my "undesired boot images",
because the timeline there is:

1. MCO overrides the MachineSet's boot image configuration, inserting
   a stock boot image ID instead of the admin-preferred boot image ID.
2. Admin updates MachineConfiguration to disable MCO boot image
   management for that MachineSet.  After this, all overriding will be
   past-tense.  Cluster-admin preference for the admin-preferred boot
   image over the stock boot image can continue in the present-tense.
3. Admin restores MachineSet boot image ID configuration.
4. Admin deletes any Machines which launched from the stock boot
   image.

So I don't like the present-tense "overriding" in wording about step
(4).

It also feels weird to me to have:

> ...you can disable the boot image management feature...

as a non-link, with a later:

> To disable boot image management, see
> ref:../machine_configuration/mco-update-boot-images.adoc#mco-update-boot-images-disable_machine-configs-configure[disable
> boot image management].

giving the link.  I'd have expected some of the folks reading the
initial words to think "huh, how do I do this step?", and having the
words they were wondering about be the link to the docs that would
clear them up seems more usable than requiring them to skim to the end
of the paragraph to find the link they need.  But maybe there are doc
conventions around this whose motivation I don't understand?

Anyhow, none of my concerns seem large enough to be worth delaying a
known-issue declaration over, so I'm adopting Darragh's wording in
this commit, pointing out the places I don't agree with in this commit
message, and leaving it up to the docs folks to decide if any of my
concerns are worth further word-smithing in future pull requests.

[1]: openshift#94987 (comment)
@wking wking force-pushed the boot-image-clobber-known-issue branch from 72d6586 to cf256d9 Compare June 24, 2025 22:56
@openshift-ci openshift-ci bot added the merge-review-needed Signifies that the merge review team needs to review this PR label Jun 24, 2025
Copy link

openshift-ci bot commented Jun 24, 2025

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@maxwelldb maxwelldb added the merge-review-in-progress Signifies that the merge review team is reviewing this PR label Jun 25, 2025
@maxwelldb maxwelldb added this to the Continuous Release milestone Jun 25, 2025
@maxwelldb maxwelldb self-requested a review June 25, 2025 14:30
@@ -2741,6 +2741,8 @@ In the following tables, features are marked with the following statuses:

* In {product-title} {product-version}, clusters using IPsec for network encryption might experience intermittent loss of pod-to-pod connectivity. This prevents some pods on certain nodes from reaching services on other nodes, resulting in connection timeouts. Internal testing could not reproduce this issue on clusters with 120 nodes or less. There is no workaround for this issue. (link:https://issues.redhat.com/browse/OCPBUGS-55453[OCPBUGS-55453])

* If you install a cluster on {aws-short} that has Amazon Machine Images (AMI) enabled or on {gcp-short} that has custom disk images enabled, the boot image management overrides these customization images with boot images. As a workaround, you can disable the boot image management feature, restore the boot images for the machine sets to their original location, and delete any machines that were incorrectly generated by the overriding boot images. To disable boot image management, see ref:../machine_configuration/mco-update-boot-images.adoc#mco-update-boot-images-disable_machine-configs-configure[disable boot image management]. (link:https://issues.redhat.com/browse/OCPBUGS-57796[OCPBUGS-57796])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If you install a cluster on {aws-short} that has Amazon Machine Images (AMI) enabled or on {gcp-short} that has custom disk images enabled, the boot image management overrides these customization images with boot images. As a workaround, you can disable the boot image management feature, restore the boot images for the machine sets to their original location, and delete any machines that were incorrectly generated by the overriding boot images. To disable boot image management, see ref:../machine_configuration/mco-update-boot-images.adoc#mco-update-boot-images-disable_machine-configs-configure[disable boot image management]. (link:https://issues.redhat.com/browse/OCPBUGS-57796[OCPBUGS-57796])
If you install a cluster on {aws-short} with Amazon Machine Images (AMI) enabled, or on {gcp-short} with custom disk images enabled, the boot image management feature overrides these custom images with default boot images.
+
As a workaround, you can disable the boot image management feature, restore the original boot images for the machine sets, and delete any machines that were incorrectly created by the overriding boot images.
+
To disable boot image management, see ref:../machine_configuration/mco-update-boot-images.adoc#mco-update-boot-images-disable_machine-configs-configure[Disable boot image management].
+
(link:https://issues.redhat.com/browse/OCPBUGS-57796[OCPBUGS-57796])

I'm thinking something like this? Be sure to remove doubled spaces and clarify the first clause, regardless.

Copy link
Contributor

@maxwelldb maxwelldb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion left.

Because this impacts published release notes, it will need to go through the change management process. Be sure to indicate QE approval in the checkbox in the PR description, too. Thanks!

@maxwelldb maxwelldb removed merge-review-in-progress Signifies that the merge review team is reviewing this PR merge-review-needed Signifies that the merge review team needs to review this PR labels Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch/enterprise-4.19 jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. peer-review-done Signifies that the peer review team has reviewed this PR size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants