add kep for DRA Attributes Downward API #5606

alaypatel07 · 2025-10-03T14:01:40Z

One-line PR description:
KEP-5304: Adding downward API for DRA Device Attributes to Pod

Issue link: DRA: Device Attributes in Downward API #5304

Other comments:

k8s-ci-robot · 2025-10-03T14:01:44Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

alaypatel07 · 2025-10-03T14:02:09Z

/wg-device-management

k8s-ci-robot · 2025-10-03T16:27:23Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: alaypatel07
Once this PR has been reviewed and has the lgtm label, please assign dchen1107, johnbelamaric for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Alay Patel <[email protected]>

kannon92 · 2025-10-03T18:19:28Z

Please combine PRR into this PR.

We usually review as one PR.

alaypatel07 · 2025-10-03T18:58:21Z

@kannon92 This PR already has the PRR. If a single PR is to be reviewed, all I need to do is close the other PR

keps/sig-node/5304-dra-attributes-downward-api/README.md

mortent · 2025-10-06T19:33:13Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+// DRADeviceFieldRef selects a DRA-resolved device attribute for a given claim+request.
+// +featureGate=DRADownwardDeviceAttributes
+// +structType=atomic
+type DRADeviceFieldRef struct {


This can map to more than one device, since each request might ask for multiple devices. How is the data surfaced in the env variables or the volume? The DRA device names doesn't necessarily map to any identifier that is known to the consumer of this information.

Added a section Multi-device requests, in Kubelet Impelmentation

Multi-device requests:

When deviceIndex is unset, kubelet resolves the attribute across all allocated devices for the request, preserving allocation order, and joins values with a comma (",") into a single string. Devices that do not report the attribute are skipped. If no devices provide the attribute, the value is considered not ready.

When deviceIndex is set, kubelet selects the device at that zero-based index from the allocation results and resolves the attribute for that device only. If the index is out of range or the attribute is missing on that device, the value is considered not ready.

keps/sig-node/5304-dra-attributes-downward-api/README.md

mortent · 2025-10-06T19:42:27Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+3. Watches ResourceSlices: Resolves standardized attributes from `spec.devices[*].attributes` for the matching device name
+4. Maintains Cache: Keeps a per-Pod map of `(claimName, requestName) -> {attribute: value}` with a readiness flag
+
+Resolution Semantics:


What about error handling here? I'm wondering what happens if:

A claim or request can't be found?

The requested attribute is not available for one or more of the allocated devices?

See this: #5606 (comment)

See the section Failure on missing data

keps/sig-node/5304-dra-attributes-downward-api/README.md

Signed-off-by: Alay Patel <[email protected]>

mortent

Just a few comments, but overall this looks good to me.

keps/sig-node/5304-dra-attributes-downward-api/README.md

johnbelamaric

a few minor things to resolve, otherwise LGTM including PRR

keps/sig-node/5304-dra-attributes-downward-api/README.md

johnbelamaric · 2025-10-16T00:30:13Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+   - Older kubelet without the feature will ignore `resourceSliceAttributeRef` (it is dropped during decoding)
+   - Containers still start; env vars/volumes referencing `resourceSliceAttributeRef` will not be populated
+   - **Risk**: Workloads relying on these values may misbehave
+   - **Mitigation**: Avoid relying on the field until all kubelets are upgraded; gate scheduling to upgraded nodes (e.g., using node labels/taints) or keep the feature gate disabled on the API server until nodes are updated


consider integrating with #5328

+1, does this need any work in the KEP or should this be an implementation issue?

If you plan to do it, I would note it here. Non-blocking though.

keps/sig-node/5304-dra-attributes-downward-api/README.md

Signed-off-by: Alay Patel <[email protected]>

johnbelamaric · 2025-10-16T16:41:47Z

PRR is OK now, just need approval from SIG Node. cc @klueska @mrunalp @dchen1107 @SergeyKanzhelev

SergeyKanzhelev · 2025-10-16T16:44:20Z

@klueska is marked as a primary approver in kep.yaml and I would really like to hear if this is what is needed.

SergeyKanzhelev · 2025-10-16T16:54:50Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+### Goals
+
+- Provide a stable Downward API path for device attributes associated with `pod.spec.resourceClaims[*]` requests
+- Support device attributes from `ResourceSlice` that are requested by user in pod spec


do we need to provide the DRA name beyond attributes? Are there some fungability scenario when pod doesn't even know which DRA gave it a resource?

pods will always know what DRA gave it resource through the combination of resource claim name + request name + device name.

However, in case of prioritized list feature, it is possible that user asks for a device in oneOf(set of requests) mechanism. In that case a user might have to make sure all the requests have the attribute. This is a user/UX problem however

I don't think it is UX problem. If we envision some kind of one-of, than user will need to know which one was picked. So the information about DRA plugin needs to be exposed via downwards API as well. At least this is how I understand the scenario here

Interesting points @SergeyKanzhelev. Yes, in the prioritized list, if you get different devices some will have the attribute and some might not. If one is a GPU and one is a CPU, for example.

On the one hand, it's not clear passing via downward API makes sense in those use cases. So, should that block doing it in a use case where it does make sense? On the other hand, these kinds of "works with this but not that" creates real friction for users. Hmm.

In the fungibility use cases with prioritized list, there are two current strategies for the containers: we can run a single container that is able to look at which devices it got and adjust appropriately, or we can run multiple containers and have one sleep if it's the wrong one. If we go with "missing attribute causes pod failure at container start time", these two features won't work together, because pods will fail if the selected device doesn't have the attribute.

We have also considered adding binding conditions in the resource claim and then allowing a controller to mutate the container image in PreBind. If we could also mutate the downward API spec, we could make them work together with that. But I don't think we can mutate that (I suspect it's not mutable).

So, two possible solutions:

Somehow tie the downward API spec to the Device Request rather than the container config (seems totally wrong)

Don't fail if attributes are missing

There is another solution to this, the attribute mentioned in the pod spec downward API, could be added as a required attribute to the resource claim request. This will make sure the scheduler only picks the devices which has this attributes. However, we still have to manage runtime issues like "resource slices going missing during pod creation".

but that doesn't work in the fungibility case. it should be legit for us to pick a device that only works for one of two containers, based on the "two container" strategy above. And with the "one container" strategy, we would need to know which attributes to publish from which devices, based on the request choice.

ie, if we put it in the selectors of the subrequest, then we would only want to run the container that will make use of that subrequest, but we have no way to specify such a thing today

SergeyKanzhelev · 2025-10-16T16:56:14Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+
+### Non-Goals
+
+- Expose the entirety of `ResourceClaim`/`ResourceSlice` objects in the Downward API


so realistically a Pod can only rely on attributes that were specified in CEL expression? Other attributes may not exist.

Sorry I dont follow the question. The pod has the link to resource claim which provided the device. The resource claim status has the request name + device name and the device name in resource slice has the attribute.

no, the question is kind of similar to #5606 (comment)

If one wants to follow the defensive programming and ensure that all attributes it asks for in the Pod are present on devices that DRA gave it - the only way to do it is to specify those atributes somehow in CEL expression.

One can rely on the "knowledge of devices". But it is not very reliable and makes containers crash long after scheduling, making it harder to investigate

SergeyKanzhelev · 2025-10-16T16:58:46Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+### Notes/Constraints/Caveats (Optional)
+
+- Environment variables are set at container start time: Once a container starts, its environment variables are immutable. If device attributes change after container start, env vars will not reflect the change.
+- Resolution timing: Attributes are resolved at container start time (not at allocation time). There is no scheduler-side copying of attributes into `ResourceClaim`.


please note the kubelet restart or container crash scenarios. The behavior must be declared in those cases. Ideally something aligned with #3721

We need to clearly articulate that the crash/restart of a container may lead to it's unavilability to start if attribute or resource has disappeared

We need to clearly articulate that the crash/restart of a container may lead to it's unavilability to start if attribute or resource has disappeared

I agree, I will add this.

SergeyKanzhelev · 2025-10-16T16:59:39Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+
+- Environment variables are set at container start time: Once a container starts, its environment variables are immutable. If device attributes change after container start, env vars will not reflect the change.
+- Resolution timing: Attributes are resolved at container start time (not at allocation time). There is no scheduler-side copying of attributes into `ResourceClaim`.
+- ResourceSlice churn: Resolution uses the contents of the matching `ResourceSlice` at container start. If the `ResourceSlice` (or the requested attribute) is missing at that time, kubelet records an event and fails the pod start.


why Pod fails to start? Do you mean container fails to start and pod follows whatever the restart policy is? Or you explictily want to change the Pod error handing for this case?

Do you mean container fails to start and pod follows whatever the restart policy is?

I mean this, yes. I will clarify. There are two factors here, the resource claim is shared with all containers, but the downward API is for specific containers inside the pod. So this needs to be clearly stated.

SergeyKanzhelev · 2025-10-16T17:00:47Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+
+## Motivation
+
+Workloads that need to interact with DRA-allocated devices (like KubeVirt virtual machines) require access to device-specific identifiers such as PCIe bus addresses or mediated device UUIDs. In order to fetch the attributes from allocated device, users first have to go to ResourceClaimStatus, find the request and device name, and then look up the resource slice with device name to get the attribute value. Ecosystem project like KubeVirt must resort to custom controllers that watch these objects and inject attributes via annotations/labels or other custom mechanisms, often leading to fragile, error-prone and racy designs.


how do people implement it today with Device Plugin and why DRA requirements are new?

this may need to be filled up in Alternatives section of this KEP

how do people implement it today with Device Plugin and why DRA requirements are new?

With device plugins, the drivers are expected to populate this environment variable based on the name of the device.

However in case of DRA, since there is an indirection in the API, it takes three steps to get to the name of the device, lookup in Pod Spec to find the resource claim name, lookup in resource claim status to find the device name, lookup in resource slice to find the attribute value. So there are two options:

ask the drivers to implement env variable PCI_CLAIMNAME_REQUESTNAME_DEVICENAME=<attribute_value>. As you can see, the env API is very constricted, having three levels of indirection is very hard to generate and find this information for drivers and workloads

Write custom controllers to infer this value and populate the env variable. This is how it is implemented in KubeVirt now, as an alpha feature, however, it requires setting the attribute value in Kubevirt CR status. This creates problems when KubeVirt tries to migrate VM from one node to another, where the attribute values has to change and coordinated. It is desired that if the pod has the env variable it can just come up on the new node and find its devices metadata

ask the drivers to implement env variable PCI_CLAIMNAME_REQUESTNAME_DEVICENAME=<attribute_value>. As you can see, the env API is very constricted, having three levels of indirection is very hard to generate and find this information for drivers and workloads

The DRA driver is publishing those attributes in the first place. So it should know what to inject. Or this is for scenarios when attributes are not contolled by the DRA plugin?

This is for the driver that is publishing those attributes correct, but the issue is with fulfilling the contract between driver and KubeVirt. For KubeVirt to generate the right domxml for the device, it needs to know the GPU name that is configured, see this: https://github.com/kubevirt/kubevirt/blob/559fae099c734c7ba61332caef06567e9f572ddf/pkg/virt-launcher/virtwrap/device/hostdevice/dra/gpu_hostdev.go#L78-L83

However, this name is purely in KubeVirt workload spec, it is not available to the driver. So once the consumer discovers PCI_CLAIMNAME_REQUESTNAME_DEVICENAME env variable set by driver, it has to reverse lookup the VMI spec to find the device name for it. If this instead implemented as a contract between pod spec and kubevirt then it is much easier to discover the device attributes with device from inside the pod.

so the issue is that one of the attributes is not injected as env variable today and the assumption is that it will be best to declare which attributes are needed in pod spec than update DRA plugin to inject more attributes to all containers?

If this is the main scenario, it may be interesting to explore if all env vars must be declared this way as the best practice. Having a mix of auto-injected and declared sounds like a trouble.

Yes it will make the contract between the consumers and producers of device metadata information much simpler.

why not use the NodePrepareResources hook to plumb down the information to the driver?, we are already passing the Claim there to the driver

SergeyKanzhelev · 2025-10-16T17:03:41Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+
+#### Story 2
+
+As a DRA driver author, I want my driver to remain unchanged while allowing applications to consume device attributes (like `resource.kubernetes.io/pcieRoot` or `dra.kubervirt.io/mdevUUID`) through the native Kubernetes Downward API.


versioning story between DRA plugin and Pods will be interesting here. New DRA driver needs to fully rollout before new attributes can be consumed. However Pod has no means to check the DRA driver version on the node when scheduling.

Is this something we want to handle in this KEP? Will some kind of a CEL statement can solve this problem? Like semver check of a DRA version?

versioning story between DRA plugin and Pods will be interesting here. New DRA driver needs to fully rollout before new attributes can be consumed. However Pod has no means to check the DRA driver version on the node when scheduling.

I agree with this, however, this is a separate problems that will surface in other parts like usage of attributes in CEL expression etc. IMHO this should be a separate effort.

Usage in CEL should not affect whether pod can start. or you are saying that the CEL expression can be used to determine the DRA version for proper allocation of the Pod on nodes with the "fresh" DRA version? If so - it is worth mentioning in the KEP

Usage in CEL should not affect whether pod can start.

CEL usage in ResourceClaim does affect pod startup in the sense, if user requests an attribute through CEL expression that is not present due to driver upgrade, the pod will be stuck in scheduling state forever. So what I am saying is that versioning of attributes is completely separate unsolved problem, we have discussed this in #wg-device-management meeting, but unfortunately it isnt tracked anywhere.

yes, we are saying the same thing. I just prefer not scheduled things than occupying space and crashlooping. So solution here may be that Pod is not being scheduled before scheduler sees these attributes on a device. Do you see scenario when it's best to schedule Pod and let it wait for attribute to appear on a device?

SergeyKanzhelev · 2025-10-16T17:05:40Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+
+####
+
+### Kubelet Implementation


do we want the presence of attributes to be a runtime failure as described? Or make it pod admission failure and proactively check the whole Pod's containers ahead of time when pod sandbox is being created?

The bad thing about runtime failure - for pods with restart policy Always -runtime failure will mean that the Pod will get stuck in crash loop backoff

do we want the presence of attributes to be a runtime failure as described? Or make it pod admission failure and proactively check the whole Pod's containers ahead of time when pod sandbox is being created?

do this proactive checks get re-tried or does it drive the pod into terminal state? I am worried about the slow informers case where if the attributes arrive later than pre-sandbox check.

it should very much be the odd and exceptional case that the RS is late or gone. Remember, it had to already exist to be selected. I am not worried about races where somehow the driver runs, publishes the resource slice, pod gets scheduled, the kubelet picks it up, but it hasn't yet seen the RS. That seems highly unlikely. The RS informer would have to be way way behind the Pod informer on the same kubelet and apiserver connection, which AFAIK seems unrealistic?

@johnbelamaric for the attribute resolution to happen, we need both RS informer and ResourceClaim(RC) informer to be caught up. While I agree with you that RS slow seems unrealistic(and probably not worth solving for, assuming a lot of other things will fail at that point) but the RC informer which provides the latest RC that was created for this pod could be behind as well, leading to issues I mentioned above.

so image you have a large training job with restartPolicy=Never. Does this slow down of informers that can be because of a scale of a cluster may affect the job and some pods will simply not start?

Since the restartPolicy=Never, container will not try to restart after the first failure so the whole job will be jeopardized.

SergeyKanzhelev · 2025-10-16T17:07:09Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+- Failure on missing data: If the `ResourceSlice` is not found, or the attribute is absent on any allocated device at container start, kubelet records a warning event and returns an error to the sync loop. The pod start fails per standard semantics (e.g., `restartPolicy` governs restarts; Jobs will fail the pod).
+- Multi-device requests: Kubelet resolves the attribute across all allocated devices for the request, preserving allocation order, and joins values with a comma (",") into a single string. If any allocated device does not report the attribute, resolution fails (pod start error).
+
+Security & RBAC:


are all attributes that DRA exposes OK for Pod to consume? Are there any sensitive information that cluster administators may need to keep away from users in multi-tenant environments?

See this: #5606 (comment)

can you please add this to the non-goals than to avoid confusion.

SergeyKanzhelev · 2025-10-16T17:07:36Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+
+- Name: `DRADownwardDeviceAttributes`
+- Stage: Alpha (v1.35)
+- Components: kube-apiserver, kubelet, kube-scheduler


how kube-scheduler is affected?

kube-scheduler is not affected here, I have the feature gate just for consistency with other DRA features. I can remove it however. Extra flags are of now use.

ok, this is aligned with my unerstanding then. Please remove it from here. I was very confused by what changes scheduler will need here

SergeyKanzhelev · 2025-10-16T17:08:10Z

keps/sig-node/5304-dra-attributes-downward-api/kep.yaml

+participating-sigs: []
+status: implementable
+creation-date: 2025-10-02
+reviewers:


need sig node reviewer here. You can use my name

SergeyKanzhelev · 2025-10-16T17:09:53Z

@klueska is marked as a primary approver in kep.yaml and I would really like to hear if this is what is needed.

I added review on KEP mechanics. Clarifying those will be great before merging. When clarifying behaviors, please add mention of a corresponding tests into the tests section.

On semantics side I do not have strong understanding. I added a couple comments on this, but @klueska's feedback will be very useful

SergeyKanzhelev · 2025-10-16T22:16:50Z

I tried to summarize decisions needed in this KEP:

This KEP introduces new failure mode (missing attribute) to the "late" stage of Container start. This bubbles the complexity of implementing fungability scenarios and ensuring version match between DRA plugin and Pod to the control plane. The same time this decisions opens opportunities for future scenarios like late "discovering" of attributes. However DRA today in a state when all attributes are static and known to DRA plugin. This will likely force developers to write additional CEL conditions for each attribute they use in downwards API to make sure their Pods will never be scheduled on a node where they will be crashing continuously. Moreover, KEP is not attempting to eliminate or discourage automatic env vars injections by plugins that works today. Making the state of things more confusing.

If we believe that long term we will need flexibility of attributes being sourced from different controllers, and ability to schedule Pod so it will wait for the attribute availability, this KEP will enable those.

If we believe that attributes will more or less stay static, than moving failure to earlier stages - all the way to scheduling - would make the most sense.

Lastly, the scenario driver today is the fact that the DRA plugin is not injecting all attributes to the Pod. If we believe DRA plugins will continue increasing the number of attributes and most of them will only be meaningful to a subset of workloads, this KEP makes sense. If we see that there are handful of attributes any workload ever need, and DRA plugin is OK to inject them all, this KEP is not bringing much value.

aojea · 2025-10-19T09:04:26Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+2. Watches `ResourceClaim` objects in the Pod's namespace to retrieve allocation information
+3. Watches `ResourceSlice` objects for the node and driver to resolve device attributes
+4. Maintains a per-Pod cache of `(claimName, requestName) -> {attribute: value}` mappings


This has a serious scalability impact

I don't think we want the kubelet to be in the business of watching all ResourceSlices. We would need a way of allowing the kubelet to quickly look up the device in a specific resource slice on-demand (possibly caching it upon first look up). Either that, or make sure that it only opened a watch for ResourceSlices matching the current node.

aojea · 2025-10-19T09:05:46Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+The kubelet runs a local DRA attributes controller that:
+
+1. Watches Pods: Identifies Pods on the node with `pod.spec.resourceClaims` and tracks their `pod.status.resourceClaimStatuses` to discover generated ResourceClaim names
+2. Watches ResourceClaims: For each relevant claim, reads `status.allocation.devices.results[*]` and maps entries by request name
+3. Watches ResourceSlices: Resolves standardized attributes from `spec.devices[*].attributes` for the matching device name
+4. Maintains Cache: Keeps a per-Pod map of `(claimName, requestName) -> {attribute: value}` with a readiness flag


Can you please elaborate on the implementation?
Does it open a new watch per ResourceClaim and ResourceSlice?
The Kubelet already does a Get on the NodePrepareResources hook and passes the claim to the driver

klueska · 2025-10-20T13:58:25Z

keps/sig-node/5304-dra-attributes-downward-api/README.md

+          claimName: pgpu-claim
+          requestName: pgpu-request
+          attribute: resource.kubernetes.io/pcieRoot
+          # If multiple devices are allocated for this request, values are joined with "," in allocation order.


This feels fragile

…ributes Signed-off-by: Alay Patel <[email protected]>

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Oct 3, 2025

k8s-ci-robot requested review from dchen1107 and mrunalp October 3, 2025 14:01

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 3, 2025

alaypatel07 force-pushed the kep-5304 branch 3 times, most recently from 3c92417 to 3557abc Compare October 3, 2025 16:27

alaypatel07 force-pushed the kep-5304 branch from 3557abc to 03742ca Compare October 3, 2025 16:31

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 3, 2025

add PRR for kep-5304 DRA attributes downward api

515f6df

Signed-off-by: Alay Patel <[email protected]>

alaypatel07 mentioned this pull request Oct 3, 2025

add PRR for kep-5304 DRA attributes downward api #5609

Closed

alaypatel07 force-pushed the kep-5304 branch 2 times, most recently from 92f13db to 96e3424 Compare October 3, 2025 17:03

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 3, 2025

alaypatel07 force-pushed the kep-5304 branch 2 times, most recently from f1d195c to 6568154 Compare October 3, 2025 17:04

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 3, 2025

mortent reviewed Oct 6, 2025

View reviewed changes

jmickey mentioned this pull request Oct 7, 2025

DRA: Device Attributes in Downward API #5304

Open

4 tasks

k8s-ci-robot requested a review from jeremyrickard October 15, 2025 22:02

add kep for DRA Attributes Downward API

6324114

Signed-off-by: Alay Patel <[email protected]>

alaypatel07 force-pushed the kep-5304 branch from 93d9ba7 to 6324114 Compare October 15, 2025 22:03

mortent reviewed Oct 15, 2025

View reviewed changes

keps/sig-node/5304-dra-attributes-downward-api/README.md Outdated Show resolved Hide resolved

keps/sig-node/5304-dra-attributes-downward-api/README.md Outdated Show resolved Hide resolved

keps/sig-node/5304-dra-attributes-downward-api/README.md Outdated Show resolved Hide resolved

johnbelamaric reviewed Oct 16, 2025

View reviewed changes

varunrsekar reviewed Oct 16, 2025

View reviewed changes

keps/sig-node/5304-dra-attributes-downward-api/README.md Outdated Show resolved Hide resolved

remove deviceIndex

71d47e6

Signed-off-by: Alay Patel <[email protected]>

alaypatel07 force-pushed the kep-5304 branch 2 times, most recently from 71d47e6 to c96efd1 Compare October 16, 2025 03:24

SergeyKanzhelev reviewed Oct 16, 2025

View reviewed changes

aojea reviewed Oct 19, 2025

View reviewed changes

klueska reviewed Oct 20, 2025

View reviewed changes

update KEP-5304 to instead use CDI mounts to help discover device att…

9e8b439

…ributes Signed-off-by: Alay Patel <[email protected]>

alaypatel07 force-pushed the kep-5304 branch from c96efd1 to 9e8b439 Compare October 22, 2025 02:16

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 22, 2025


		### Non-Goals

		- Expose the entirety of `ResourceClaim`/`ResourceSlice` objects in the Downward API


		## Motivation

		Workloads that need to interact with DRA-allocated devices (like KubeVirt virtual machines) require access to device-specific identifiers such as PCIe bus addresses or mediated device UUIDs. In order to fetch the attributes from allocated device, users first have to go to ResourceClaimStatus, find the request and device name, and then look up the resource slice with device name to get the attribute value. Ecosystem project like KubeVirt must resort to custom controllers that watch these objects and inject attributes via annotations/labels or other custom mechanisms, often leading to fragile, error-prone and racy designs.


		#### Story 2

		As a DRA driver author, I want my driver to remain unchanged while allowing applications to consume device attributes (like `resource.kubernetes.io/pcieRoot` or `dra.kubervirt.io/mdevUUID`) through the native Kubernetes Downward API.

add kep for DRA Attributes Downward API #5606

Are you sure you want to change the base?

add kep for DRA Attributes Downward API #5606

Conversation

alaypatel07 commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Oct 3, 2025

Uh oh!

alaypatel07 commented Oct 3, 2025

Uh oh!

k8s-ci-robot commented Oct 3, 2025

Uh oh!

kannon92 commented Oct 3, 2025

Uh oh!

alaypatel07 commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mortent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnbelamaric left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johnbelamaric commented Oct 16, 2025

Uh oh!

SergeyKanzhelev commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alaypatel07 Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alaypatel07 commented Oct 3, 2025 •

edited

Loading

alaypatel07 Oct 16, 2025 •

edited

Loading

SergeyKanzhelev Oct 16, 2025 •

edited

Loading