-
Notifications
You must be signed in to change notification settings - Fork 4.2k
test: use custom gpu node config for processor tests #8607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
test: use custom gpu node config for processor tests #8607
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@elmiko this should be the solution to (finally) unblock #8583 Don't worry about grok'ing all of the test foo. cc @towca @BigDarkClown for that part (🙏) cc @sbueringer in case you're tracking #8583 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know this test overly well but this is making sense to me.
expectedReadiness[nodeDirectXReady.Name] = true | ||
|
||
nodeDirectXUnready := &apiv1.Node{ | ||
// Here we add a vanilla NotReady node (no GPU or other device labels or status conditions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more to ensure that our filter function doesn't affect no-GPU Nodes at all (same for the "ready vanilla" case).
}, | ||
// TestFilterOutNodesWithUnreadyResourcesDRA tests that FilterOutNodesWithUnreadyResources | ||
// does the right thing based on DRA configuration present in the node. | ||
func TestFilterOutNodesWithUnreadyResourcesDRA(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, this grew quite complex 😅. Your intuitions about the test behavior are mostly correct, but if we're refactoring, IMO rewriting this into the table-based approach would be much clearer.
For example, we could have a test case like this:
type testCase struct { // Maps keyed by node name
allNodes map[string]*apiv1.Node
readyNodes map[string]*apiv1.Node
wantNodesWithUnreadyOverride map[string]*apiv1.Node
}
Test code would be something like:
gotAllNodes, gotReadyNodes := processor.FilterOutNodesWithUnreadyResources(ctx, toList(tc.allNodes), toList(tc.readyNodes), nil)
gotAllNodesSet, gotReadyNodesSet := toSet(gotAllNodes), toSet(gotReadyNodes) // Keyed by node name
assert that gotAllNodesSet has the same keys as tc.allNodes
assert that gotReadyNodesSet has the same keys as (tc.readyNodes - tc.wantNodesWithUnreadyOverride)
for each node in gotReadyNodesSet:
assert that the node is identical to tc.readyNodes[node.Name]
for each node in gotAllNodesSet:
if the node is not in tc.wantNodesWithUnreadyOverride, assert that it's identical to tc.allNodes[node.Name]
else assert that the node has the expected not-Ready condition (and that there's only 1 condition for readiness), and is otherwise identical to tc.allNodes[node.Name]
And we'd have these test cases based on the current ones:
- GPU label present, condition ready, nvidia.com/gpu resource 0 -> overwritten to unready
- GPU label present, condition ready, nvidia.com/gpu resource 1 -> no overwrites
3., 4. -> same as 1. and 2. but for the directX resource - GPU label present, condition ready, no nvidia/directX resource -> overwritten to unready
- No GPU label, condition ready, no nvidia/directX resource -> no overwrites
- No GPU label, condition unready, no nvidia/directX resource -> no overwrites
And then we can just add an additional case to cover the DRA part:
- GPU label present, condition ready, no nvidia/directX resource, GetNodeGpuConfig indicates DRA -> no overwrites
As for the test provider setup, IMO it'd make the most sense to set up just 1 provider object, and have the custom GetNodeGpuConfig
we're registering behave differently based on the Node it gets. This will keep the setup code and the test cases simple. E.g. We could have the method return different responses based on if the passed Node has "dra" in its name.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, I'll re-write, thanks for the detailed notes on a quick initial attempt!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@towca see my commit 2 as an attempt to refactor into a single test case (we can decompose the actual cases after we're confident this is the right structure)
thoughts?
ebf2c74
to
5c9c8cc
Compare
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This PR removes a reference to the GCE cloud provider from the gpu processor test library. As an exercise to better understand the current UT I did a "verbose refactor", and then added an additional DRA test which goes through the basic DRA driver filtering outcomes.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: