You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
robbypambudi opened this issue
Jan 16, 2025
· 0 comments
Labels
bugSomething that is supposed to be working; but isn'tcoreIssues that should be addressed in Ray CorejobstriageNeeds triage (eg: priority, bug/not-bug, and owning component)
Attempting to stop a Ray job using the ray job stop command fails with a 500 error. The detailed traceback indicates a TimeoutError during the job termination process.
Command Executed:
ray job submit --runtime-env-json='{"pip": ["requests==2.26.0"]}' --working-dir ./ -- python script.py
ray job stop raysubmit_MFG6KpQRRaGKBawC
Output
Job submission server address: http://**.**.73.122:8265
Attempting to stop job 'raysubmit_VYbBGNJeavjmHdyy'
Traceback (most recent call last):
File "/home/ray/anaconda3/bin/ray", line 8, in <module>
sys.exit(main())
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/scripts/scripts.py", line 2668, in main
return cli()
File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/cli_utils.py", line 54, in wrapper
return func(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/autoscaler/_private/cli_logger.py", line 823, in wrapper
return f(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/cli.py", line 389, in stop
client.stop_job(job_id)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/sdk.py", line 284, in stop_job
self._raise_error(r)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 283, in _raise_error
raise RuntimeError(
RuntimeError: Request failed with status code 500: Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/job_head.py", line 390, in stop_job
resp = await job_agent_client.stop_job_internal(job.submission_id)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/job_head.py", line 88, in stop_job_internal
async with self._session.post(
File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/client.py", line 1197, in __aenter__
self._resp = await self._coro
File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/client.py", line 608, in _request
await resp.start(conn)
File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 976, in start
message, payload = await protocol.read() # type: ignore[union-attr]
File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/streams.py", line 640, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
Issue Details:
Job Submission Server Address: http://..73.122:8265
The text was updated successfully, but these errors were encountered:
robbypambudi
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jan 16, 2025
bugSomething that is supposed to be working; but isn'tcoreIssues that should be addressed in Ray CorejobstriageNeeds triage (eg: priority, bug/not-bug, and owning component)
What happened + What you expected to happen
Description:
Attempting to stop a Ray job using the ray job stop command fails with a 500 error. The detailed traceback indicates a TimeoutError during the job termination process.
Command Executed:
Output
Issue Details:
Job Submission Server Address: http://..73.122:8265
Versions / Dependencies
Ray Version: 2.40.0
Reproduction script
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: