Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] ray job stop Fails with 500 Error in CLI #49889

Open
robbypambudi opened this issue Jan 16, 2025 · 0 comments
Open

[Core] ray job stop Fails with 500 Error in CLI #49889

robbypambudi opened this issue Jan 16, 2025 · 0 comments
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core jobs triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@robbypambudi
Copy link

What happened + What you expected to happen

Description:

Attempting to stop a Ray job using the ray job stop command fails with a 500 error. The detailed traceback indicates a TimeoutError during the job termination process.

Command Executed:

ray job submit --runtime-env-json='{"pip": ["requests==2.26.0"]}' --working-dir ./ -- python script.py
ray job stop raysubmit_MFG6KpQRRaGKBawC

Output

Job submission server address: http://**.**.73.122:8265
Attempting to stop job 'raysubmit_VYbBGNJeavjmHdyy'
Traceback (most recent call last):
  File "/home/ray/anaconda3/bin/ray", line 8, in <module>
    sys.exit(main())
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/scripts/scripts.py", line 2668, in main
    return cli()
  File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/cli_utils.py", line 54, in wrapper
    return func(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/autoscaler/_private/cli_logger.py", line 823, in wrapper
    return f(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/cli.py", line 389, in stop
    client.stop_job(job_id)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/sdk.py", line 284, in stop_job
    self._raise_error(r)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 283, in _raise_error
    raise RuntimeError(
RuntimeError: Request failed with status code 500: Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/job_head.py", line 390, in stop_job
    resp = await job_agent_client.stop_job_internal(job.submission_id)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/job_head.py", line 88, in stop_job_internal
    async with self._session.post(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/client.py", line 1197, in __aenter__
    self._resp = await self._coro
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/client.py", line 608, in _request
    await resp.start(conn)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 976, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/streams.py", line 640, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

Issue Details:

Job Submission Server Address: http://..73.122:8265

Versions / Dependencies

Ray Version: 2.40.0

Reproduction script


runtime_env = {"pip": ["emoji"]}

ray.init(runtime_env=runtime_env)

@ray.remote
def f():
  import emoji
  return emoji.emojize('Python is :thumbs_up:')

print(ray.get(f.remote()))

Issue Severity

High: It blocks me from completing my task.

@robbypambudi robbypambudi added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 16, 2025
@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Jan 21, 2025
@jjyao jjyao added jobs and removed core Issues that should be addressed in Ray Core labels Jan 22, 2025
@cszhu cszhu added the core Issues that should be addressed in Ray Core label Jan 23, 2025
@jjyao jjyao removed the core Issues that should be addressed in Ray Core label Jan 27, 2025
@cszhu cszhu added the core Issues that should be addressed in Ray Core label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core jobs triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

4 participants