[k8s] move filtering to stream time when calculating allocated GPU quantity by each node #7478

SeungjinYang · 2025-10-04T07:02:32Z

The two users of get_all_pods_in_kubernetes_cluster does the same postprocessing of the pods

filter out pods that are not in PENDING or RUNNING
filter out pods according to should_exclude_pod_from_gpu_allocation
and then postprocesses the pods calculate a collections.defaultdict(int).

By moving the filtering and postprocessing logic into the stream processing, we can reduce the memory needs of executing this call especially in larger k8s contexts with a lot of pods.

Tested (run the relevant ones):

Code formatting: install pre-commit (auto-check on commit) or bash format.sh
Any manual or new tests for this PR (please specify below)
All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

kevinmingtarja · 2025-10-14T06:48:48Z

/quicktest-core --kubernetes
/smoke-test --kubernetes

kevinmingtarja

LGTM overall, thanks @SeungjinYang! Just one question regarding the allocated_qty calculation.

sky/provision/kubernetes/utils.py

SeungjinYang requested a review from kevinmingtarja October 4, 2025 07:02

SeungjinYang mentioned this pull request Oct 6, 2025

[k8s] Optimize _list_accelerators #7475

Merged

5 tasks

Base automatically changed from optimize-k8s-list-accelerators to master October 13, 2025 17:31

SeungjinYang force-pushed the pod-filter-optimization branch 2 times, most recently from bd50169 to 24a4912 Compare October 13, 2025 17:36

SeungjinYang marked this pull request as ready for review October 13, 2025 17:37

SeungjinYang force-pushed the pod-filter-optimization branch from b05eb75 to a66ee81 Compare October 14, 2025 00:18

kevinmingtarja reviewed Oct 14, 2025

View reviewed changes

sky/provision/kubernetes/utils.py Show resolved Hide resolved

SeungjinYang added 3 commits October 14, 2025 10:55

init

963516a

format, testfix

31d6187

change summation logic

4c54079

SeungjinYang force-pushed the pod-filter-optimization branch from a66ee81 to 4c54079 Compare October 14, 2025 17:57

kevinmingtarja approved these changes Oct 14, 2025

View reviewed changes

format

49e814b

SeungjinYang enabled auto-merge (squash) October 14, 2025 18:11

SeungjinYang merged commit f0f90c4 into master Oct 14, 2025
20 checks passed

SeungjinYang deleted the pod-filter-optimization branch October 14, 2025 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[k8s] move filtering to stream time when calculating allocated GPU quantity by each node #7478

[k8s] move filtering to stream time when calculating allocated GPU quantity by each node #7478

Uh oh!

SeungjinYang commented Oct 4, 2025

Uh oh!

kevinmingtarja commented Oct 14, 2025

Uh oh!

kevinmingtarja left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[k8s] move filtering to stream time when calculating allocated GPU quantity by each node #7478

[k8s] move filtering to stream time when calculating allocated GPU quantity by each node #7478

Uh oh!

Conversation

SeungjinYang commented Oct 4, 2025

Uh oh!

kevinmingtarja commented Oct 14, 2025

Uh oh!

kevinmingtarja left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants