Skip to content

Conversation

elliot-barn
Copy link
Contributor

@elliot-barn elliot-barn commented Oct 9, 2025

bumping vllm from 0.10.1 -> 0.10.2 for deployment serve llm example

Deployment serve vllm can run on the existing llm-cu128 lock file for the vllm image:
https://buildkite.com/ray-project/release/builds/62672#_

@elliot-barn elliot-barn requested a review from aslonnie October 9, 2025 04:16
@elliot-barn elliot-barn requested review from a team as code owners October 9, 2025 04:16
cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades vllm from version 0.10.1 to 0.10.2. The changes are mostly in documentation files to reflect this version bump. I've found a few inconsistencies and minor issues in the documentation that should be addressed.

Value error, The checkpoint you are trying to load has model type `gpt_oss` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
```
Older vLLM and Transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to Transformers. Upgrade **vLLM ≥ 0.10.1** and let your package resolver such as `pip` handle the other dependencies.
Older vLLM and transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to transformers. Upgrade **vLLM ≥ 0.10.1** and let your package resolver such as `pip` handle the other dependencies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The documentation here is inconsistent. The text suggests upgrading to vLLM >= 0.10.1, but this pull request upgrades to vLLM 0.10.2, and the code block below correctly suggests installing vllm>=0.10.2. The source notebook.ipynb has been correctly updated to vLLM >= 0.10.2.

To avoid confusion, please ensure this file is regenerated from the notebook to reflect the correct version.

Suggested change
Older vLLM and transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to transformers. Upgrade **vLLM ≥ 0.10.1** and let your package resolver such as `pip` handle the other dependencies.
Older vLLM and transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to transformers. Upgrade **vLLM ≥ 0.10.2** and let your package resolver such as `pip` handle the other dependencies.

sudo apt-get install -y --no-install-recommends build-essential

RUN pip install vllm==0.10.1
RUN pip install vllm==0.10.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a good practice to end files with a newline character. Some tools might have issues with files that don't end with a newline. Please add a newline at the end of this file.

</div>

*gpt-oss* is a family of open-source models designed for general-purpose language understanding and generation. The 20B parameter variant (`gpt-oss-20b`) offers strong reasoning capabilities with lower latency. This makes it well-suited for local or specialized use cases. The larger 120B parameter variant (`gpt-oss-120b`) is designed for production-scale, high-reasoning workloads.
*gpt-oss* is a family of open-source models designed for general-purpose language understanding and generation. The 20 B parameter variant (`gpt-oss-20b`) offers strong reasoning capabilities with lower latency. This makes it well-suited for local or specialized use cases. The larger 120 B parameter variant (`gpt-oss-120b`) is designed for production-scale, high-reasoning workloads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file appears to have been modified directly, but the comment at the top of the file states that notebook.ipynb should be modified instead, and this file should be regenerated. This change (20B to 20 B) is not present in notebook.ipynb, leading to an inconsistency.

Please apply the desired changes to notebook.ipynb and regenerate this file to maintain consistency.


The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes include:
- Corporate firewall or proxy blocks `openaipublic.blob.core.windows.net`. You may need to whitelist this domain.
The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common cause includes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are a couple of issues here:

  1. The phrase "Common cause includes:" is grammatically incorrect. It should be "Common causes include:".
  2. This change is inconsistent with the corresponding change in notebook.ipynb, which was changed to "Common causes includes:".

Please update the source notebook.ipynb with the correct grammar and regenerate this file.

Suggested change
The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common cause includes:
The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes include:

"```\n",
"\n",
"The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes include:\n",
"The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes includes:\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The change from "Common causes include:" to "Common causes includes:" is grammatically incorrect. Since "causes" is plural, the verb should be "include".

Suggested change
"The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes includes:\n",
"The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes include:\n",

@ray-gardener ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation release-test release test labels Oct 9, 2025
cursor[bot]

This comment was marked as outdated.

alanwguo and others added 14 commits October 9, 2025 23:00
…d node_head (#56726)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

Frequently there are changes to reporter_agent.py where the relevant
code in node_head and dashboard UI doesn't also get changed.

This pydantic model will help maintain compatibility across these
different files.

Note: In the future, we should also update node_head to utilize these
pydantic models so we can guarantee compatibility without forcing
backwards-compatible changes to the schema.

Also fixes test_reporter to not share state between tests. Remove some
invalid test cases (ex: gpus without names or index)

Tested with pydantic v1 and v2.

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number
#56009

<!-- For example: "Closes #1234" -->

## Checks

Manually tested, ray dashboard continues to work with GPUS
<img width="1338" height="787" alt="Screenshot 2025-09-25 at 3 53 23 PM"
src="https://github.com/user-attachments/assets/c68de5b2-fc9f-42ce-a078-a99a2fea2eec"
/>


- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Alan Guo <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
…es to launch_and_validate_cluster.py (#55719)

Signed-off-by: Mark Rossetti <[email protected]>
Co-authored-by: Jiajun Yao <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Adding 3 dependencies needed to authenticate and push/pull Azure blobs:
`azure-storage-file-datalake`, `azure-identity`, and `msal`

---------

Signed-off-by: kevin <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
there were two `list_java_files` in the file. one was used, and a
different one was tested..

---------

Signed-off-by: Kevin H. Luu <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
running into tag limit of ecr again

Signed-off-by: Lonnie Liu <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
@elliot-barn elliot-barn force-pushed the elliot-barn/deployment-serve-llm-upgrade branch from b6e2f37 to af9e2eb Compare October 9, 2025 23:01
@elliot-barn elliot-barn requested review from a team as code owners October 9, 2025 23:01
"train/examples/**/README.md",
"serve/tutorials/deployment-serve-llm/README.*",
"serve/tutorials/deployment-serve-llm/*/notebook.ipynb",
"serve/tutorials/deployment-serve-llm/**/README.*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sphinx uses the serve/tutorials/deployment-serve-llm/**/README.* to build the toctree, so we shouldn't hide them

Signed-off-by: elliot-barn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs An issue or change related to documentation release-test release test serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants