Skip to content

Conversation

@alimaazamat
Copy link
Contributor

@alimaazamat alimaazamat commented Oct 30, 2025

Why are these changes needed?

V2 Autoscaling allows for configuring idleTimeoutSeconds per worker group, which needs added unit + e2e testing
EDIT: e2e testing already done in #2725 issue should've been closed

Related issue number

#2561

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@marosset marosset self-requested a review October 30, 2025 21:52
@alimaazamat alimaazamat changed the title test: add unit and e2e tests for idleTimeoutSeconds config per worker… test: add unit tests for idleTimeoutSeconds config per worker… Oct 30, 2025
@alimaazamat alimaazamat changed the title test: add unit tests for idleTimeoutSeconds config per worker… test: unit tests for idleTimeoutSeconds config per worker groups Oct 30, 2025
@marosset
Copy link

LGTM

@andrewsykim
Copy link
Member

@ryanaoleary can you review since you added the feature?

@alimaazamat alimaazamat force-pushed the idleTimeoutSecondsTests branch from 8512527 to a4bc057 Compare November 13, 2025 19:44
@alimaazamat alimaazamat force-pushed the idleTimeoutSecondsTests branch from a4bc057 to ae88086 Compare November 13, 2025 19:45
Copy link
Collaborator

@ryanaoleary ryanaoleary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I tested some of these cases manually and it looks correct.

@andrewsykim
Copy link
Member

andrewsykim commented Nov 13, 2025

@alimaazamat the e2e buildkite failure seems related? Can you take a look https://buildkite.com/ray-project/ray-ecosystem-ci-kuberay-ci/builds/11598#019a7efb-eeee-48fd-8614-6bd5d5311f4e

Triggered a retry incase it's a flake

@alimaazamat
Copy link
Contributor Author

@alimaazamat the e2e buildkite failure seems related? Can you take a look https://buildkite.com/ray-project/ray-ecosystem-ci-kuberay-ci/builds/11598#019a7efb-eeee-48fd-8614-6bd5d5311f4e

@andrewsykim Yeah I reran it already previously and it did fail again. Because this are just unit tests I thought it wouldn't impact e2e?

@alimaazamat
Copy link
Contributor Author

Actually I think I found the issue:
Seems like part2 autoscaler tests are using env var only and not the spec version.

@alimaazamat
Copy link
Contributor Author

alimaazamat commented Nov 14, 2025

@andrewsykim
In buildkite the only errors I'm seeing are reconcillation errors where worker pod status is failed during teardown. https://github.com/alimaazamat/kuberay/blob/idleTimeoutSecondsTests/ray-operator/controllers/ray/raycluster_controller.go#L173 is the only time we call our ValidateRayClusterSpec changes and if it is invalid, the controller would return immediately and wouldn't even hit the errors? Let me know if my thinking is on the wrong track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants