Fix verify_max_gpu_provision test to use dynamic GPU counts by umfranci · Pull Request #4006 · microsoft/lisa

umfranci · 2025-09-16T13:44:57Z

Updates to the verify_max_gpu_provision test to dynamically validate GPU count based on actual node capabilities instead of hardcoding it to 8 GPUs. This fixes test failures for VMs with less than 8 GPUs and cases where container policies don't provide the required information.

Changed test to dynamically use node.capability.gpu_count
Minimum requirement reduced from 8 to 2 GPUs to test "multiple GPU" scenarios
Test now adapts to actual hardware capabilities rather than fixed assumptions

squirrelsc · 2025-09-16T19:05:02Z

microsoft/testsuites/gpu/gpusuite.py

    )
    def verify_max_gpu_provision(self, node: Node, log: Logger) -> None:
-        _gpu_provision_check(8, node, log)
+        actual_gpu_count = node.capability.gpu_count


This test case is meant to validate a VM with the maximum GPU configuration. If it's modified as shown, it won't perform the verification correctly. If VM sizes with 8 GPUs are unavailable but still get scheduled, please check for a bug in the VM size capability calculation. If the GPU information is missing in some policy, please set it to 0 to skip this test case.

sure, thanks @squirrelsc - will try to check more on this but it seems if there is no capability returned for a VM from Azure, it assumes the one provided in the requirement and proceeds with the test.

In either case, can we have a check at the start of the test to skip the execution if actual GPU count is less than 8?

I'm not sure I understand the request. If there's no capacity with 8 GPUs, the test case will be skipped—is that the check you're referring to?

skipping max_gpu_provision tc by checking actual gpu count

5542f39

squirrelsc reviewed Sep 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix verify_max_gpu_provision test to use dynamic GPU counts#4006

Fix verify_max_gpu_provision test to use dynamic GPU counts#4006
umfranci wants to merge 1 commit intomainfrom
umfranci/max-gpu-tc-fix-16092025

umfranci commented Sep 16, 2025

Uh oh!

squirrelsc Sep 16, 2025

Uh oh!

umfranci Sep 17, 2025

Uh oh!

squirrelsc Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

umfranci commented Sep 16, 2025

Uh oh!

squirrelsc Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

umfranci Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

squirrelsc Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants