Skip to content

TELCODOCS-2306: KMM 2.4 Release Notes #95248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 27, 2025

Conversation

StephenJamesSmith
Copy link
Contributor

@StephenJamesSmith StephenJamesSmith commented Jun 25, 2025

KMM 2.4 release notes

Version(s):
openshift-4.19, openshift-4.20
KMM 2.4

Issue:
https://issues.redhat.com/browse/TELCODOCS-2306

Link to docs preview:
https://95248--ocpdocs-pr.netlify.app/openshift-enterprise/latest/hardware_enablement/kmm-release-notes.html

Dev: @ybettan
QE: @cdvultur
QE review:
[x] QE has approved this change.

Additional information:
Note to Reviewers: You have already reviewed jiras for the sections Bug fixes and Known issues. Please review the New features section.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 25, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 25, 2025

@StephenJamesSmith: This pull request references TELCODOCS-2306 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

Version(s):

Issue:

Link to docs preview:

QE review:

  • QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sorry, something went wrong.

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 25, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 25, 2025

@StephenJamesSmith: This pull request references TELCODOCS-2306 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

KMM 2.4 release notes

Version(s):
openshift-4.19
KMM 2.4

Issue:
https://issues.redhat.com/browse/TELCODOCS-2306

Link to docs preview:
https://95220--ocpdocs-pr.netlify.app/openshift-enterprise/latest/hardware_enablement/kmm-release-notes.html

Dev: @ybettan
QE: @cdvultur
QE review:
[] QE has approved this change.

Additional information:
Note to Reviewers: You have already reviewed jiras for the sections Bug fixes and Known issues. Please review the New features section.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 25, 2025

@StephenJamesSmith: This pull request references TELCODOCS-2306 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

KMM 2.4 release notes

Version(s):
openshift-4.19
KMM 2.4

Issue:
https://issues.redhat.com/browse/TELCODOCS-2306

Link to docs preview:

Dev: @ybettan
QE: @cdvultur
QE review:
[] QE has approved this change.

Additional information:
Note to Reviewers: You have already reviewed jiras for the sections Bug fixes and Known issues. Please review the New features section.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ocpdocs-previewbot
Copy link

ocpdocs-previewbot commented Jun 25, 2025

🤖 Fri Jun 27 20:43:23 - Prow CI generated the docs preview:

https://95248--ocpdocs-pr.netlify.app/openshift-enterprise/latest/hardware_enablement/kmm-release-notes.html

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 25, 2025

@StephenJamesSmith: This pull request references TELCODOCS-2306 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

KMM 2.4 release notes

Version(s):
openshift-4.19
KMM 2.4

Issue:
https://issues.redhat.com/browse/TELCODOCS-2306

Link to docs preview:
https://95248--ocpdocs-pr.netlify.app/openshift-enterprise/latest/hardware_enablement/kmm-release-notes.html

Dev: @ybettan
QE: @cdvultur
QE review:
[] QE has approved this change.

Additional information:
Note to Reviewers: You have already reviewed jiras for the sections Bug fixes and Known issues. Please review the New features section.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

=== New features
// TELCODOCS-2343
* A new feature in this release is the addition of a preflight validation resource in the cluster that you can use to verify kernel modules to be installed on the nodes after cluster upgrades and possible kernel upgrades. Preflight validation also reports on the status and progress of each module in the cluster that it attempts or has attempted to validate. For more information, see xref:../updating/preparing_for_updates/kmm-preflight-validation.adoc#kmm-validation-kickoff_kmm-preflight-validation[Preflight validation for Kernel Module Management (KMM) Modules].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a new feature. It was just updated - its fields were changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


// TELCODOCS-2344
* In this release, a new requirement when creating a kmod image is that both the `.ko` kernel module files and the `cp` binary must be included, which is required for copying files during the image loading process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requirement was added in KMM 2.3. We only extended the documentation to mention the requirement in this release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@ybettan
Copy link
Member

ybettan commented Jun 26, 2025

Where are all the rest of the new feature? They all shows up in the Jira board - features are epics and sometimes a single task (change the filter accordingly).

  • Using in-tree kmod with device plugin
  • Operator capability was changed to "seamless upgrades` in the Red Hat catalog
  • New init-container for the device-plugin pods.
  • Use the container-runtime (CRIO) for checking if the images exist - allowing a ~100% compatibility with Openshift and the cluster-wide image-configuration settings.
  • KMM and KMM-hub now have the Meets Best Practices label in the Red Hat catalog.
  • Updated KMM to install on worker nodes if the control-plane nodes aren't available without the need of artificially label the nodes as "control-plane".
  • Reduced significantly the number of events for one of the controller (node heartbeat filter for NMC)
  • Removed a a duplication of the webhook-service for KMM (This is actually a "bug fix" and not a feature).

** *Cause*: Generating files with `controller-gen` generated a service called `webhook-service` that is not configurable. And, when deploying KMM with OLM, OLM deploys a service for the webhook called `-service`.

** *Consequence*: Two services were generated for the same deployment. One generated by `controller-gen` and added to the bundle manifests and the other one created "on the fly" by OLM.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 [error] RedHat.TermsErrors: Use 'dynamically', 'as needed', 'in real time', or 'immediately' rather than 'on the fly'. For more information, see RedHat.TermsErrors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted "on the fly".

@StephenJamesSmith
Copy link
Contributor Author

@cdvultur @ybettan Please review and /lgtm if all is good.

@ybettan
Copy link
Member

ybettan commented Jun 26, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 26, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 26, 2025

@StephenJamesSmith: This pull request references TELCODOCS-2306 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

KMM 2.4 release notes

Version(s):
openshift-4.19, openshift-4.20
KMM 2.4

Issue:
https://issues.redhat.com/browse/TELCODOCS-2306

Link to docs preview:
https://95248--ocpdocs-pr.netlify.app/openshift-enterprise/latest/hardware_enablement/kmm-release-notes.html

Dev: @ybettan
QE: @cdvultur
QE review:
[] QE has approved this change.

Additional information:
Note to Reviewers: You have already reviewed jiras for the sections Bug fixes and Known issues. Please review the New features section.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 26, 2025

@StephenJamesSmith: This pull request references TELCODOCS-2306 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

KMM 2.4 release notes

Version(s):
openshift-4.19, openshift-4.20
KMM 2.4

Issue:
https://issues.redhat.com/browse/TELCODOCS-2306

Link to docs preview:
https://95248--ocpdocs-pr.netlify.app/openshift-enterprise/latest/hardware_enablement/kmm-release-notes.html

Dev: @ybettan
QE: @cdvultur
QE review:
[x] QE has approved this change.

Additional information:
Note to Reviewers: You have already reviewed jiras for the sections Bug fixes and Known issues. Please review the New features section.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 26, 2025
@StephenJamesSmith
Copy link
Contributor Author

/label telco

@openshift-ci openshift-ci bot added the telco Label for all Telco PRs label Jun 26, 2025
@StephenJamesSmith
Copy link
Contributor Author

/label peer-review-needed

@openshift-ci openshift-ci bot added the peer-review-needed Signifies that the peer review team needs to review this PR label Jun 26, 2025
Copy link

@cdvultur cdvultur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 27, 2025
@abhatt-rh
Copy link
Contributor

/label peer-review-in-progress

@openshift-ci openshift-ci bot added the peer-review-in-progress Signifies that the peer review team is reviewing this PR label Jun 27, 2025
Copy link
Contributor

@abhatt-rh abhatt-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @StephenJamesSmith,
Nice work; big release this one! :)
I have added a couple of comments related to style and syntax for you consideration.

I would also like to note that I spotted inconsistencies with the use of the Operator name: Kernel Module Management Operator (KMM), where it is expanded in some places, it is abbreviated in others inconsistently. Considering the release is already out, fixing this issue can be out of scope for this PR but do consider tracking it in a separate issue perhaps.

/remove-label peer-review-in-progress
/remove-label peer-review-needed
/label peer-review-done

=== New features and enhancements
// TELCODOCS-2311
* In this release, you now have the option to configure KMM Module to not load an out-of-tree kernel driver and use the in-tree driver instead and run only the device plugin. For more information see xref:../hardware_enablement/kmm-kernel-module-management.adoc#kmm-using-intree-modules_kernel-module-management-operator[Using in-tree modules].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that the general convention across most RN within the OpenShift docs is to use "With this release," but i am not suggesting it now in favor of consistency with the previous KMM release note versions.

Suggested change
* In this release, you now have the option to configure KMM Module to not load an out-of-tree kernel driver and use the in-tree driver instead and run only the device plugin. For more information see xref:../hardware_enablement/kmm-kernel-module-management.adoc#kmm-using-intree-modules_kernel-module-management-operator[Using in-tree modules].
* In this release, you now have the option to configure KMM module to not load an out-of-tree kernel driver and use the in-tree driver instead, and run only the device plugin. For more information, see xref:../hardware_enablement/kmm-kernel-module-management.adoc#kmm-using-intree-modules_kernel-module-management-operator[Using in-tree modules].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

* In this release, KMM configurations are now persistent following cluster and KMM Operator upgrades and redeployments of KMM.
+
In earlier releases, a cluster or KMM upgrade, or any other intentional or unintentional action, such as upgrading a non-default configuration like the firmware path that redeploys KMM, could create the need to reconfigure KMM. In this release, KMM configurations now remain persistent regardless of any of these actions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In earlier releases, a cluster or KMM upgrade, or any other intentional or unintentional action, such as upgrading a non-default configuration like the firmware path that redeploys KMM, could create the need to reconfigure KMM. In this release, KMM configurations now remain persistent regardless of any of these actions.
In earlier releases, a cluster or KMM upgrade, or any other action, such as upgrading a non-default configuration like the firmware path that redeploys KMM, could create the need to reconfigure KMM. In this release, KMM configurations now remain persistent regardless of any of such actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"intentional or unintentional" is important. Leaving it in. Made other change.

In earlier releases, a cluster or KMM upgrade, or any other intentional or unintentional action, such as upgrading a non-default configuration like the firmware path that redeploys KMM, could create the need to reconfigure KMM. In this release, KMM configurations now remain persistent regardless of any of these actions.
+
For more information see xref:../hardware_enablement/kmm-kernel-module-management.adoc#kmm-configuring-kmmo_kernel-module-management-operator[Configuring the Kernel Module Management Operator].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For more information see xref:../hardware_enablement/kmm-kernel-module-management.adoc#kmm-configuring-kmmo_kernel-module-management-operator[Configuring the Kernel Module Management Operator].
For more information, see xref:../hardware_enablement/kmm-kernel-module-management.adoc#kmm-configuring-kmmo_kernel-module-management-operator[Configuring the Kernel Module Management Operator].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.


// MGMT-19735
* Improvements have been added to KMM so that GPU Operator vendors will not need to replicate KMM functionality in their code, but instead use KMM as is. This will greatly improve Operators' code size, tests, and reliability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider providing an example of one such improvement to avoid ambiguity.
Also, see Tenses in release notes

Suggested change
* Improvements have been added to KMM so that GPU Operator vendors will not need to replicate KMM functionality in their code, but instead use KMM as is. This will greatly improve Operators' code size, tests, and reliability.
* Improvements have been added to KMM so that GPU Operator vendors will not need to replicate KMM functionality in their code, but instead use KMM as is. This change greatly improves Operators' code size, tests, and reliability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.


// MGMT-18966
* In this release, KMM no longer uses HTTP[s] direct requests to check if a kmod image exists. Instead, CRI-O is used internally to check for the images. This mitigates the need to access container image registries directly from HTTP[s] requests and manually handle tasks such as reading `/etc/containers/registries.conf` for mirroring configuration, accessing the image cluster resource for TLS configuration, mounting the CAs from the node, and maintaining our own cache in Hub & Spoke.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I am not sure if I understand what the phrase "maintaining our own cache in Hub & Spoke." means. Whose cache? KKM's? in hub and spoke cluster or KMM- hub? Kindly rephrase to clarify the meaning

Suggested change
* In this release, KMM no longer uses HTTP[s] direct requests to check if a kmod image exists. Instead, CRI-O is used internally to check for the images. This mitigates the need to access container image registries directly from HTTP[s] requests and manually handle tasks such as reading `/etc/containers/registries.conf` for mirroring configuration, accessing the image cluster resource for TLS configuration, mounting the CAs from the node, and maintaining our own cache in Hub & Spoke.
* In this release, KMM no longer uses HTTP(S) direct requests to check if a kmod image exists. Instead, CRI-O is used internally to check for the images. This mitigates the need to access container image registries directly from HTTP(S) requests and manually handle tasks, such as reading `/etc/containers/registries.conf` for mirroring configuration, accessing the image cluster resource for TLS configuration, mounting the CAs from the node, and maintaining our own cache in Hub & Spoke.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made these changes.

** *Cause*: When a module is Loaded (using the create a 'ModuleLoad' event) or Unloaded (using the create a 'ModuleUnloaded' event) the events may not appear. This happens when you load and unload the kernel module in a quick succession.

** *Consequence*: The 'ModuleLoad' and 'ModuleUnloaded' events may not appear in {product-title}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
** *Consequence*: The 'ModuleLoad' and 'ModuleUnloaded' events may not appear in {product-title}.
** *Consequence*: The `ModuleLoad` and the `ModuleUnloaded` events might not appear in {product-title}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

** *Consequence*: The 'ModuleLoad' and 'ModuleUnloaded' events may not appear in {product-title}.

** *Fix*: Alerting the user of this potential behavior and for awareness when working with modules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
** *Fix*: Alerting the user of this potential behavior and for awareness when working with modules.
** *Fix*: Introduce an alerting mechanism for this potential behavior and for awareness when working with modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

** *Cause*: The Kernel Module Management (KMM) Operator does not reload the kernel module in case the node reboot sequence is too quick. The reboot is determined based on the timestamp of the status condition being later than the timestamp in the Node Machine Configuration (NMC) status.

** *Consequence*: When the reboot happens quickly, in less time than the grace period, the node state will not change. After the node reboots, KMM will not load the kernel module again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
** *Consequence*: When the reboot happens quickly, in less time than the grace period, the node state will not change. After the node reboots, KMM will not load the kernel module again.
** *Consequence*: When the reboot happens quickly, in less time than the grace period, the node state does not change. After the node reboots, KMM does not load the kernel module again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

* Using imageRepoSecret in conjunction with DTK as imagestream gets authorization required error.

** *Cause*: On the Kernel Module Management (KMM) Operator, when you set the `imageRepoSecret` in the KMM module, and the build's resulting container image is defined to be stored in the cluster's internal registry, the build will fail to push the final image and generate an `authorization required` error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
** *Cause*: On the Kernel Module Management (KMM) Operator, when you set the `imageRepoSecret` in the KMM module, and the build's resulting container image is defined to be stored in the cluster's internal registry, the build will fail to push the final image and generate an `authorization required` error.
** *Cause*: On the Kernel Module Management (KMM) Operator, when you set the `imageRepoSecret` in the KMM module, and the build's resulting container image is defined to be stored in the cluster's internal registry, the build fails to push the final image and generate an `authorization required` error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


// MGMT-19383
The KMM and KMM-hub Operators have been assigned the "Meets Best Practices" label in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The KMM and KMM-hub Operators have been assigned the "Meets Best Practices" label in the
* The KMM and KMM-hub Operators have been assigned the "Meets Best Practices" label in the

@openshift-ci openshift-ci bot added peer-review-done Signifies that the peer review team has reviewed this PR and removed peer-review-in-progress Signifies that the peer review team is reviewing this PR peer-review-needed Signifies that the peer review team needs to review this PR labels Jun 27, 2025
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 27, 2025
Copy link

openshift-ci bot commented Jun 27, 2025

New changes are detected. LGTM label has been removed.

@StephenJamesSmith
Copy link
Contributor Author

/label merge-review-needed

@openshift-ci openshift-ci bot added the merge-review-needed Signifies that the merge review team needs to review this PR label Jun 27, 2025
@lahinson lahinson added merge-review-in-progress Signifies that the merge review team is reviewing this PR and removed merge-review-needed Signifies that the merge review team needs to review this PR labels Jun 27, 2025
Copy link
Contributor

@lahinson lahinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StephenJamesSmith As you requested, I focused my review on the new features section. PTAL at my comments and after you have made any changes, I can merge this PR.

=== New features and enhancements
// TELCODOCS-2311
* In this release, you now have the option to configure the KMM module to not load an out-of-tree kernel driver and use the in-tree driver instead, and run only the device plugin. For more information see xref:../hardware_enablement/kmm-kernel-module-management.adoc#kmm-using-intree-modules_kernel-module-management-operator[Using in-tree modules].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* In this release, you now have the option to configure the KMM module to not load an out-of-tree kernel driver and use the in-tree driver instead, and run only the device plugin. For more information see xref:../hardware_enablement/kmm-kernel-module-management.adoc#kmm-using-intree-modules_kernel-module-management-operator[Using in-tree modules].
* In this release, you now have the option to configure the Kernel Module Management (KMM) module to not load an out-of-tree kernel driver and use the in-tree driver instead, and run only the device plugin. For more information, see xref:../hardware_enablement/kmm-kernel-module-management.adoc#kmm-using-intree-modules_kernel-module-management-operator[Using in-tree modules with the device plugin].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made these changes.


// TELCODOCS-2304
* In this release, KMM configurations are now persistent following cluster and KMM Operator upgrades and redeployments of KMM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* In this release, KMM configurations are now persistent following cluster and KMM Operator upgrades and redeployments of KMM.
* In this release, KMM configurations are now persistent after cluster and KMM Operator upgrades and redeployments of KMM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

* In this release, KMM configurations are now persistent following cluster and KMM Operator upgrades and redeployments of KMM.
+
In earlier releases, a cluster or KMM upgrade, or any other intentional or unintentional action, such as upgrading a non-default configuration like the firmware path that redeploys KMM, could create the need to reconfigure KMM. In this release, KMM configurations now remain persistent regardless of any of such actions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In earlier releases, a cluster or KMM upgrade, or any other intentional or unintentional action, such as upgrading a non-default configuration like the firmware path that redeploys KMM, could create the need to reconfigure KMM. In this release, KMM configurations now remain persistent regardless of any of such actions.
In earlier releases, a cluster or KMM upgrade or any other action, such as upgrading a non-default configuration like the firmware path that redeploys KMM, could create the need to reconfigure KMM. In this release, KMM configurations now remain persistent regardless of any of such actions.

If any intentional or unintentional action could create this problem, then I think it's safe to simply say, "any action".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. changed.


// MGMT-19735
* Improvements have been added to KMM so that GPU Operator vendors will not need to replicate KMM functionality in their code, but instead use KMM as is. This change greatly improves Operators' code size, tests, and reliability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Improvements have been added to KMM so that GPU Operator vendors will not need to replicate KMM functionality in their code, but instead use KMM as is. This change greatly improves Operators' code size, tests, and reliability.
* Improvements have been added to KMM so that GPU Operator vendors do not need to replicate KMM functionality in their code, but instead use KMM as is. This change greatly improves Operators' code size, tests, and reliability.

Tiny change to avoid future tense, per IBM Style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


// MGMT-18966
* In this release, KMM no longer uses HTTP(S) direct requests to check if a kmod image exists. Instead, CRI-O is used internally to check for the images. This mitigates the need to access container image registries directly from HTTP(S) requests and manually handle tasks such as reading `/etc/containers/registries.conf` for mirroring configuration, accessing the image cluster resource for TLS configuration, mounting the CAs from the node, and maintaining our own cache in Hub & Spoke.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add a noun after /etc/containers/registries.conf to clarify what that is.
  • Do not use first person in technical documentation, per IBM Style and the Supplementary Style Guide
Suggested change
* In this release, KMM no longer uses HTTP(S) direct requests to check if a kmod image exists. Instead, CRI-O is used internally to check for the images. This mitigates the need to access container image registries directly from HTTP(S) requests and manually handle tasks such as reading `/etc/containers/registries.conf` for mirroring configuration, accessing the image cluster resource for TLS configuration, mounting the CAs from the node, and maintaining our own cache in Hub & Spoke.
* In this release, KMM no longer uses HTTP(S) direct requests to check if a kmod image exists. Instead, CRI-O is used internally to check for the images. This mitigates the need to access container image registries directly from HTTP(S) requests and manually handle tasks such as reading `/etc/containers/registries.conf` for mirroring configuration, accessing the image cluster resource for TLS configuration, mounting the CAs from the node, and maintaining your own cache in Hub & Spoke.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


// MGMT-19383
The KMM and KMM-hub Operators have been assigned the "Meets Best Practices" label in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The KMM and KMM-hub Operators have been assigned the "Meets Best Practices" label in the
* The KMM and KMM-hub Operators have been assigned the "Meets Best Practices" label in the

I think the statement on line 52 should be an item in the bulleted list of enhancements. I'm not sure why it wouldn't be included in that list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


// MGMT20613
* You can now install KMM on worker nodes, if needed. Previously, it was not possible to deploy workloads on the control-plane nodes. Because the worker nodes do not have the `node-role.kubernetes.io/control-plane` or `node-role.kubernetes.io/master` labels, the Kernel Module Management Operator might need further configurations. An internal code change has resolved this issue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the preference in the OCP docs is "compute nodes" instead of "worker nodes".

Suggested change
* You can now install KMM on worker nodes, if needed. Previously, it was not possible to deploy workloads on the control-plane nodes. Because the worker nodes do not have the `node-role.kubernetes.io/control-plane` or `node-role.kubernetes.io/master` labels, the Kernel Module Management Operator might need further configurations. An internal code change has resolved this issue.
* You can now install KMM on compute nodes, if needed. Previously, it was not possible to deploy workloads on the control-plane nodes. Because the compute nodes do not have the `node-role.kubernetes.io/control-plane` or `node-role.kubernetes.io/master` labels, the Kernel Module Management Operator might need further configurations. An internal code change has resolved this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


// MGMT-20248
* Filtering out node heartbeats events for the NMC controller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @abhatt-rh here.

Copy link

openshift-ci bot commented Jun 27, 2025

@StephenJamesSmith: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@lahinson lahinson merged commit 566bbe4 into openshift:main Jun 27, 2025
2 checks passed
@lahinson
Copy link
Contributor

/cherrypick enterprise-4.19

@lahinson
Copy link
Contributor

/cherrypick enterprise-4.20

@openshift-cherrypick-robot

@lahinson: new pull request created: #95391

In response to this:

/cherrypick enterprise-4.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-cherrypick-robot

@lahinson: new pull request created: #95392

In response to this:

/cherrypick enterprise-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch/enterprise-4.19 branch/enterprise-4.20 jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. merge-review-in-progress Signifies that the merge review team is reviewing this PR peer-review-done Signifies that the peer review team has reviewed this PR size/L Denotes a PR that changes 100-499 lines, ignoring generated files. telco Label for all Telco PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants