[release][example] deployment serve vllm upgrade #57591

elliot-barn · 2025-10-09T04:16:32Z

bumping vllm from 0.10.1 -> 0.10.2 for deployment serve llm example

Deployment serve vllm can run on the existing llm-cu128 lock file for the vllm image:
https://buildkite.com/ray-project/release/builds/62672#_

gemini-code-assist

Code Review

This pull request upgrades vllm from version 0.10.1 to 0.10.2. The changes are mostly in documentation files to reflect this version bump. I've found a few inconsistencies and minor issues in the documentation that should be addressed.

gemini-code-assist · 2025-10-09T04:18:58Z

doc/source/serve/tutorials/deployment-serve-llm/gpt-oss/README.md

 Value error, The checkpoint you are trying to load has model type `gpt_oss` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
 ```
-Older vLLM and Transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to Transformers. Upgrade **vLLM ≥ 0.10.1** and let your package resolver such as `pip` handle the other dependencies.
+Older vLLM and transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to transformers. Upgrade **vLLM ≥ 0.10.1** and let your package resolver such as `pip` handle the other dependencies.


The documentation here is inconsistent. The text suggests upgrading to vLLM >= 0.10.1, but this pull request upgrades to vLLM 0.10.2, and the code block below correctly suggests installing vllm>=0.10.2. The source notebook.ipynb has been correctly updated to vLLM >= 0.10.2.

To avoid confusion, please ensure this file is regenerated from the notebook to reflect the correct version.

Suggested change

Older vLLM and transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to transformers. Upgrade **vLLM ≥ 0.10.1** and let your package resolver such as `pip` handle the other dependencies.

Older vLLM and transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to transformers. Upgrade **vLLM ≥ 0.10.2** and let your package resolver such as `pip` handle the other dependencies.

gemini-code-assist · 2025-10-09T04:18:58Z

doc/source/serve/tutorials/deployment-serve-llm/gpt-oss/Dockerfile

    sudo apt-get install -y --no-install-recommends build-essential

-RUN pip install vllm==0.10.1
+RUN pip install vllm==0.10.2


It's a good practice to end files with a newline character. Some tools might have issues with files that don't end with a newline. Please add a newline at the end of this file.

gemini-code-assist · 2025-10-09T04:18:58Z

doc/source/serve/tutorials/deployment-serve-llm/gpt-oss/README.md

-</div>
-
-*gpt-oss* is a family of open-source models designed for general-purpose language understanding and generation. The 20B parameter variant (`gpt-oss-20b`) offers strong reasoning capabilities with lower latency. This makes it well-suited for local or specialized use cases. The larger 120B parameter variant (`gpt-oss-120b`) is designed for production-scale, high-reasoning workloads.
+*gpt-oss* is a family of open-source models designed for general-purpose language understanding and generation. The 20 B parameter variant (`gpt-oss-20b`) offers strong reasoning capabilities with lower latency. This makes it well-suited for local or specialized use cases. The larger 120 B parameter variant (`gpt-oss-120b`) is designed for production-scale, high-reasoning workloads.


This file appears to have been modified directly, but the comment at the top of the file states that notebook.ipynb should be modified instead, and this file should be regenerated. This change (20B to 20 B) is not present in notebook.ipynb, leading to an inconsistency.

Please apply the desired changes to notebook.ipynb and regenerate this file to maintain consistency.

gemini-code-assist · 2025-10-09T04:18:58Z

doc/source/serve/tutorials/deployment-serve-llm/gpt-oss/README.md


-The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes include:
- Corporate firewall or proxy blocks `openaipublic.blob.core.windows.net`. You may need to whitelist this domain.
+The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common cause includes:


There are a couple of issues here:

The phrase "Common cause includes:" is grammatically incorrect. It should be "Common causes include:".

This change is inconsistent with the corresponding change in notebook.ipynb, which was changed to "Common causes includes:".

Please update the source notebook.ipynb with the correct grammar and regenerate this file.

Suggested change

The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common cause includes:

The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes include:

gemini-code-assist · 2025-10-09T04:18:58Z

doc/source/serve/tutorials/deployment-serve-llm/gpt-oss/notebook.ipynb

    "```\n",
    "\n",
-    "The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes include:\n",
+    "The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes includes:\n",


The change from "Common causes include:" to "Common causes includes:" is grammatically incorrect. Since "causes" is plural, the verb should be "include".

Suggested change

"The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes includes:\n",

"The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes include:\n",

…d node_head (#56726)   ## Why are these changes needed? Frequently there are changes to reporter_agent.py where the relevant code in node_head and dashboard UI doesn't also get changed. This pydantic model will help maintain compatibility across these different files. Note: In the future, we should also update node_head to utilize these pydantic models so we can guarantee compatibility without forcing backwards-compatible changes to the schema. Also fixes test_reporter to not share state between tests. Remove some invalid test cases (ex: gpus without names or index) Tested with pydantic v1 and v2.  ## Related issue number #56009  ## Checks Manually tested, ray dashboard continues to work with GPUS <img width="1338" height="787" alt="Screenshot 2025-09-25 at 3 53 23 PM" src="https://github.com/user-attachments/assets/c68de5b2-fc9f-42ce-a078-a99a2fea2eec" /> - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Alan Guo <[email protected]> Signed-off-by: elliot-barn <[email protected]>

…es to launch_and_validate_cluster.py (#55719) Signed-off-by: Mark Rossetti <[email protected]> Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: elliot-barn <[email protected]>

Adding 3 dependencies needed to authenticate and push/pull Azure blobs: `azure-storage-file-datalake`, `azure-identity`, and `msal` --------- Signed-off-by: kevin <[email protected]> Signed-off-by: elliot-barn <[email protected]>

there were two `list_java_files` in the file. one was used, and a different one was tested.. --------- Signed-off-by: Kevin H. Luu <[email protected]> Signed-off-by: elliot-barn <[email protected]>

running into tag limit of ecr again Signed-off-by: Lonnie Liu <[email protected]> Signed-off-by: elliot-barn <[email protected]>

Signed-off-by: elliot-barn <[email protected]>

…ent-serve-llm-upgrade

Signed-off-by: elliot-barn <[email protected]>

Aydin-ab · 2025-10-10T17:50:42Z

doc/source/conf.py

    "train/examples/**/README.md",
    "serve/tutorials/deployment-serve-llm/README.*",
-    "serve/tutorials/deployment-serve-llm/*/notebook.ipynb",
+    "serve/tutorials/deployment-serve-llm/**/README.*",


sphinx uses the serve/tutorials/deployment-serve-llm/**/README.* to build the toctree, so we shouldn't hide them

Signed-off-by: elliot-barn <[email protected]>

elliot-barn requested a review from aslonnie October 9, 2025 04:16

elliot-barn requested review from a team as code owners October 9, 2025 04:16

This comment was marked as outdated.

Sign in to view

gemini-code-assist bot reviewed Oct 9, 2025

View reviewed changes

ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation release-test release test labels Oct 9, 2025

This comment was marked as outdated.

Sign in to view

alanwguo and others added 14 commits October 9, 2025 23:00

[Core] [Azure] Adding a test cluster config for azure and minor updat…

3e5ca7b

…es to launch_and_validate_cluster.py (#55719) Signed-off-by: Mark Rossetti <[email protected]> Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: elliot-barn <[email protected]>

[release] Remove duplicate list_java_files func (#57582)

31ae84b

there were two `list_java_files` in the file. one was used, and a different one was tested.. --------- Signed-off-by: Kevin H. Luu <[email protected]> Signed-off-by: elliot-barn <[email protected]>

[ci] kick forge (#57588)

efd84c7

running into tag limit of ecr again Signed-off-by: Lonnie Liu <[email protected]> Signed-off-by: elliot-barn <[email protected]>

running example without lock file

9061586

Signed-off-by: elliot-barn <[email protected]>

removing requirements

a3760bb

Signed-off-by: elliot-barn <[email protected]>

regenerating readme

be0b708

Signed-off-by: elliot-barn <[email protected]>

updating read me

515092a

Signed-off-by: elliot-barn <[email protected]>

adding to exclude patterns

f7c445e

Signed-off-by: elliot-barn <[email protected]>

running example without lock file

a3d4216

Signed-off-by: elliot-barn <[email protected]>

removing requirements

1594347

Signed-off-by: elliot-barn <[email protected]>

regenerating readme

d68f21f

Signed-off-by: elliot-barn <[email protected]>

updating read me

af9e2eb

Signed-off-by: elliot-barn <[email protected]>

elliot-barn force-pushed the elliot-barn/deployment-serve-llm-upgrade branch from b6e2f37 to af9e2eb Compare October 9, 2025 23:01

elliot-barn requested review from a team as code owners October 9, 2025 23:01

elliot-barn added 4 commits October 9, 2025 23:02

Merge remote-tracking branch 'origin/master' into elliot-barn/deploym…

bf1913f

…ent-serve-llm-upgrade

adding header back'

af65fdb

Signed-off-by: elliot-barn <[email protected]>

updating exclude patterns

2ee83b2

Signed-off-by: elliot-barn <[email protected]>

fixing paths

4cdd2fb

Signed-off-by: elliot-barn <[email protected]>

Aydin-ab reviewed Oct 10, 2025

View reviewed changes

updating ray version

30492aa

Signed-off-by: elliot-barn <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[release][example] deployment serve vllm upgrade #57591

[release][example] deployment serve vllm upgrade #57591

elliot-barn commented Oct 9, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Aydin-ab Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	Older vLLM and transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to transformers. Upgrade vLLM ≥ 0.10.1 and let your package resolver such as `pip` handle the other dependencies.
	Older vLLM and transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to transformers. Upgrade vLLM ≥ 0.10.2 and let your package resolver such as `pip` handle the other dependencies.

	The `openai_harmony` library needs the tiktoken encoding files and tries to fetch them from OpenAI's public host. Common cause includes:
	The `openai_harmony` library needs the tiktoken encoding files and tries to fetch them from OpenAI's public host. Common causes include:

	"The `openai_harmony` library needs the tiktoken encoding files and tries to fetch them from OpenAI's public host. Common causes includes:\n",
	"The `openai_harmony` library needs the tiktoken encoding files and tries to fetch them from OpenAI's public host. Common causes include:\n",

[release][example] deployment serve vllm upgrade #57591

Are you sure you want to change the base?

[release][example] deployment serve vllm upgrade #57591

Conversation

elliot-barn commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Aydin-ab Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

elliot-barn commented Oct 9, 2025 •

edited

Loading