Skip to content

Conversation

@marquiz
Copy link
Contributor

@marquiz marquiz commented Aug 26, 2025

This PR adds support for directly adjusting the intelRdt object of the container config.

Previously, we have the concept of RDT class, which provides indirect manipulation of the intelRdt.closID field. There the higher level runtime (containerd, cri-o) registers a resolver function which the NRI adaptation uses to translate the provided "RDT class" to actual closID (which is basically the resctrl group in the Linux resctrl filesystem).

The API extensions in this PR provides direct control of the relevant fields of the intelRdt object. The PR also includes a sample plugin to test/demonstrate the functionality (including deployment files and CI integration to build/publish images)

REFS:

NOTES:

OPEN QUESTION/TODO:

In this PR there is no mechanism for removing the intelRdt object from the config. That is probably a desirable feature.

@marquiz
Copy link
Contributor Author

marquiz commented Aug 26, 2025

@klihub
Copy link
Member

klihub commented Aug 26, 2025

nit: typo in the body of commit message for 'api: add rdt': repeated 'the' in "... the the IntelRdt message is sepate from LinuxResources. This is to ..."

a.Linux.Resources.Unified = make(map[string]string)
}
}
func (a *ContainerAdjustment) initLinuxRdt() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: #118 has a bunch of queued similar changes, updating the result-handling code so that there is no need to always a priory create a fully constructed but empty adjustment. It would be nice to get that finally in, so I'll rebase it that on the latest main/HEAD then update with any missing init*() bits necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'll review when that's updated. And then rebase when that's merged

@klihub
Copy link
Member

klihub commented Aug 26, 2025

@marquiz @mikebrow @chrishenzie This overall LGTM. My biggest question is the timeline for tagging a new opencontainers/runtime-spec so that bits we rely on here become available via a tagged release.

@klihub klihub requested review from chrishenzie and mikebrow August 26, 2025 13:35
@marquiz marquiz force-pushed the devel/rdt branch 2 times, most recently from b2a5d93 to c089e9c Compare August 26, 2025 14:05
@marquiz
Copy link
Contributor Author

marquiz commented Aug 26, 2025

My biggest question is the timeline for tagging a new opencontainers/runtime-spec so that bits we rely on here become available via a tagged release.

I'm on the same side. I think there's no immediate hurry with this, especially because even the runc/crun implementation(s) are not there, yet. I've been working on this area so I wanted to get this one off my shoulders. That said, I think this is ready for review. I could've submitted this as a draft, but decided not to because from my perspective this is complete.

@marquiz
Copy link
Contributor Author

marquiz commented Aug 26, 2025

I just recalled that there's actually one aspect/feature missing from the new extensions: ability to remove the linux.intelRdt from the container config (set it to nil/null). Suggestions for how to implement this are warmly welcome.

I couldn't come up with any nice solutions, I played with two alternatives but didn't commit either of them:

  1. Have a separate field (like drop) in the NRI API
  2. Have a well documented, otherwise invalid value for the closID field in the NRI API (e.g. -/) that causes the config to be dropped

Copy link

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@klihub
Copy link
Member

klihub commented Aug 27, 2025

I just recalled that there's actually one aspect/feature missing from the new extensions: ability to remove the linux.intelRdt from the container config (set it to nil/null). Suggestions for how to implement this are warmly welcome.

I couldn't come up with any nice solutions, I played with two alternatives but didn't commit either of them:

  1. Have a separate field (like drop) in the NRI API
  2. Have a well documented, otherwise invalid value for the closID field in the NRI API (e.g. -/) that causes the config to be dropped

@marquiz I'd vote with a mild bias for 2 above (because it's how we handle other similar cases where removal is supported). But with the addition that on the wire-protocol this is the convention we use, but in the user-facing API we provide explicit functions for clearing any LinuxRdt, with something along the lines of this:

func (a *ContainerAdjustment) RemoveLinuxRDT() {
    a.initLinux()
    a.Linux.Rdt = &LinuxRdt{
        ClosId = MarkForRemoval("") // or MarkForRemoval("/") if you think that's better
    }
}

And then have the corresponding handling for this in intermediate result-calculation and final Spec generation.

@marquiz
Copy link
Contributor Author

marquiz commented Aug 27, 2025

@klihub thanks for the feedback. The MarkForRemoval thingie was a good hint, I didn't have that in my early prototype. I'll cook up something along the lines you suggested and then we can mull over that.

@marquiz marquiz force-pushed the devel/rdt branch 2 times, most recently from ec11b58 to d9d155a Compare August 28, 2025 07:20
@marquiz
Copy link
Contributor Author

marquiz commented Aug 28, 2025

Updated:

  • rebased (on top of the containerized-protobuild commit)
  • added unit test for pkg/runtime-tools
  • small change in pkg/adaptation/result.go (rely on initAdjustRdt() in initializing Container.Linux.Rdt)
  • fixed commit message
  • added support for removing RDT config (as a separate commit)

nit: typo in the body of commit message for 'api: add rdt': repeated 'the' in "... the the IntelRdt message is sepate from LinuxResources. This is to ..."

Thx for spotting this. Fixed

@marquiz
Copy link
Contributor Author

marquiz commented Aug 28, 2025

I'd vote with a mild bias for 2 above (because it's how we handle other similar cases where removal is supported). But with the addition that on the wire-protocol this is the convention we use, but in the user-facing API we provide explicit functions for clearing any LinuxRdt,

I ended up with adding a separate field. I implemented both variants and the "prefix closid with -/" variant was an ugly fragile mess tbh. So much cleaner to just have a separate field for this purpose. For envs and other guys it's a bit different when we're updating a map or slice and you selectively remove some items. But here we're managing the whole object. (if interested, I have the still lying around at https://github.com/marquiz/nri/commits/devel/rdt-remove-2/)

@klihub
Copy link
Member

klihub commented Aug 28, 2025

I'd vote with a mild bias for 2 above (because it's how we handle other similar cases where removal is supported). But with the addition that on the wire-protocol this is the convention we use, but in the user-facing API we provide explicit functions for clearing any LinuxRdt,

I ended up with adding a separate field. I implemented both variants and the "prefix closid with -/" variant was an ugly fragile mess tbh. So much cleaner to just have a separate field for this purpose. For envs and other guys it's a bit different when we're updating a map or slice and you selectively remove some items. But here we're managing the whole object.

@marquiz Yes, and sorry about the long sidetrack in our offline discussion earlier. I was in the totally wrong mental model, associated with most of the existing bits (slices and maps). You were absolutely right that we can't reasonably handle removal with an inline notation, because then it would not be possible to indicate a removal prior to an addition to avoiding conflict with an earlier plugin. I agree a separate dedicated field in the wire-protocol is the way to go here.

@klihub klihub marked this pull request as draft September 22, 2025 18:29
@marquiz marquiz force-pushed the devel/rdt branch 2 times, most recently from cdcd76e to aa36c34 Compare November 24, 2025 15:13
@marquiz marquiz marked this pull request as ready for review November 24, 2025 15:15
@marquiz
Copy link
Contributor Author

marquiz commented Nov 24, 2025

Rebased, marked ready-for-review. Stacked on top of #212
@klihub @mikebrow @chrishenzie

@klihub
Copy link
Member

klihub commented Nov 24, 2025

nit: typo/missing 'be' in commit message of 'api: add rdt': "Also note that the Rdt object cannot modified on container"

@klihub
Copy link
Member

klihub commented Nov 24, 2025

As a purely theoretical question related to your notes in the commit log of 'api: add rdt'.

In the (I am adraid highly unlikely) situation where runc/crun/OCI compatible runtimes would gain the ability to update RDT for a running container, what would be the best way for us to model/handle that. Would we add a corresponding &LinuxRdt{} to LinuxContainerUpdate{} ?

Add new message for directly managing the linux.intelRdt object of the
runtime-spec.

NOTE: the old RdtClass field resides under LinuxResources message but
the IntelRdt message is sepate from LinuxResources. This is to maintain
parity with the OCI runtime-spec. It is also much more logical because
the LinuxResources message is used in many places (like pod overhead,
update resources etc) where the RDT fields don't make sense or cannot be
applied. Also note that the Rdt object cannot be modified on container
update (missing from UpdateContainerRequest) as this functionality is
missing from the runtime spec and runtimes (runc/crun).

Signed-off-by: Markus Lehtonen <[email protected]>
Add support for direct manipulation of the linuxRdt object in the
container config.

Signed-off-by: Markus Lehtonen <[email protected]>
Add support for direct manipulation of the intelRdt object of the
container config.

Signed-off-by: Markus Lehtonen <[email protected]>
Signed-off-by: Markus Lehtonen <[email protected]>
Makes it possible to remove/override the linux.intelRdt object from the
container configuration. Extend the API by adding new 'Remove' field to
the LinuxRdt message which is used as a marker to fully delete/override
the IntelRdt configuration. This patch also updates the adaptation
and runtime-tools correspondingly.

Signed-off-by: Markus Lehtonen <[email protected]>
@marquiz
Copy link
Contributor Author

marquiz commented Nov 24, 2025

nit: typo/missing 'be' in commit message of 'api: add rdt': "Also note that the Rdt object cannot modified on container"

Thx for spotting this. Fixed

@marquiz
Copy link
Contributor Author

marquiz commented Nov 24, 2025

In the (I am adraid highly unlikely) situation where runc/crun/OCI compatible runtimes would gain the ability to update RDT for a running container, what would be the best way for us to model/handle that. Would we add a corresponding &LinuxRdt{} to LinuxContainerUpdate{}

Yes, I think that would likely be the place (with the usual reservations of predicting the future, how this would be implemented in runc).

@marquiz
Copy link
Contributor Author

marquiz commented Nov 24, 2025

@klihub @mikebrow @chrishenzie it would be nice to get #118 in before this. I'll then rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants