-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[release][example] deployment serve vllm upgrade #57591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request upgrades vllm
from version 0.10.1 to 0.10.2. The changes are mostly in documentation files to reflect this version bump. I've found a few inconsistencies and minor issues in the documentation that should be addressed.
Value error, The checkpoint you are trying to load has model type `gpt_oss` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date. | ||
``` | ||
Older vLLM and Transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to Transformers. Upgrade **vLLM ≥ 0.10.1** and let your package resolver such as `pip` handle the other dependencies. | ||
Older vLLM and transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to transformers. Upgrade **vLLM ≥ 0.10.1** and let your package resolver such as `pip` handle the other dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation here is inconsistent. The text suggests upgrading to vLLM >= 0.10.1
, but this pull request upgrades to vLLM 0.10.2
, and the code block below correctly suggests installing vllm>=0.10.2
. The source notebook.ipynb
has been correctly updated to vLLM >= 0.10.2
.
To avoid confusion, please ensure this file is regenerated from the notebook to reflect the correct version.
Older vLLM and transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to transformers. Upgrade **vLLM ≥ 0.10.1** and let your package resolver such as `pip` handle the other dependencies. | |
Older vLLM and transformers versions don't register `gpt_oss`, raising an error when vLLM hands off to transformers. Upgrade **vLLM ≥ 0.10.2** and let your package resolver such as `pip` handle the other dependencies. |
sudo apt-get install -y --no-install-recommends build-essential | ||
|
||
RUN pip install vllm==0.10.1 | ||
RUN pip install vllm==0.10.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
</div> | ||
|
||
*gpt-oss* is a family of open-source models designed for general-purpose language understanding and generation. The 20B parameter variant (`gpt-oss-20b`) offers strong reasoning capabilities with lower latency. This makes it well-suited for local or specialized use cases. The larger 120B parameter variant (`gpt-oss-120b`) is designed for production-scale, high-reasoning workloads. | ||
*gpt-oss* is a family of open-source models designed for general-purpose language understanding and generation. The 20 B parameter variant (`gpt-oss-20b`) offers strong reasoning capabilities with lower latency. This makes it well-suited for local or specialized use cases. The larger 120 B parameter variant (`gpt-oss-120b`) is designed for production-scale, high-reasoning workloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file appears to have been modified directly, but the comment at the top of the file states that notebook.ipynb
should be modified instead, and this file should be regenerated. This change (20B
to 20 B
) is not present in notebook.ipynb
, leading to an inconsistency.
Please apply the desired changes to notebook.ipynb
and regenerate this file to maintain consistency.
|
||
The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes include: | ||
- Corporate firewall or proxy blocks `openaipublic.blob.core.windows.net`. You may need to whitelist this domain. | ||
The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common cause includes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of issues here:
- The phrase "Common cause includes:" is grammatically incorrect. It should be "Common causes include:".
- This change is inconsistent with the corresponding change in
notebook.ipynb
, which was changed to "Common causes includes:".
Please update the source notebook.ipynb
with the correct grammar and regenerate this file.
The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common cause includes: | |
The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes include: |
"```\n", | ||
"\n", | ||
"The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes include:\n", | ||
"The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes includes:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from "Common causes include:" to "Common causes includes:" is grammatically incorrect. Since "causes" is plural, the verb should be "include".
"The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes includes:\n", | |
"The `openai_harmony` library needs the *tiktoken* encoding files and tries to fetch them from OpenAI's public host. Common causes include:\n", |
…d node_head (#56726) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? Frequently there are changes to reporter_agent.py where the relevant code in node_head and dashboard UI doesn't also get changed. This pydantic model will help maintain compatibility across these different files. Note: In the future, we should also update node_head to utilize these pydantic models so we can guarantee compatibility without forcing backwards-compatible changes to the schema. Also fixes test_reporter to not share state between tests. Remove some invalid test cases (ex: gpus without names or index) Tested with pydantic v1 and v2. <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number #56009 <!-- For example: "Closes #1234" --> ## Checks Manually tested, ray dashboard continues to work with GPUS <img width="1338" height="787" alt="Screenshot 2025-09-25 at 3 53 23 PM" src="https://github.com/user-attachments/assets/c68de5b2-fc9f-42ce-a078-a99a2fea2eec" /> - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Alan Guo <[email protected]> Signed-off-by: elliot-barn <[email protected]>
…es to launch_and_validate_cluster.py (#55719) Signed-off-by: Mark Rossetti <[email protected]> Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: elliot-barn <[email protected]>
Adding 3 dependencies needed to authenticate and push/pull Azure blobs: `azure-storage-file-datalake`, `azure-identity`, and `msal` --------- Signed-off-by: kevin <[email protected]> Signed-off-by: elliot-barn <[email protected]>
there were two `list_java_files` in the file. one was used, and a different one was tested.. --------- Signed-off-by: Kevin H. Luu <[email protected]> Signed-off-by: elliot-barn <[email protected]>
running into tag limit of ecr again Signed-off-by: Lonnie Liu <[email protected]> Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
b6e2f37
to
af9e2eb
Compare
…ent-serve-llm-upgrade
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
Signed-off-by: elliot-barn <[email protected]>
doc/source/conf.py
Outdated
"train/examples/**/README.md", | ||
"serve/tutorials/deployment-serve-llm/README.*", | ||
"serve/tutorials/deployment-serve-llm/*/notebook.ipynb", | ||
"serve/tutorials/deployment-serve-llm/**/README.*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sphinx uses the serve/tutorials/deployment-serve-llm/**/README.*
to build the toctree, so we shouldn't hide them
Signed-off-by: elliot-barn <[email protected]>
bumping vllm from 0.10.1 -> 0.10.2 for deployment serve llm example
Deployment serve vllm can run on the existing llm-cu128 lock file for the vllm image:
https://buildkite.com/ray-project/release/builds/62672#_