-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[k8s] Clarified the usage of shared memory. #4341
Conversation
@@ -21,6 +21,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct | |||
``` | |||
|
|||
- See [hyperparameter tuning](hyperparameter_tuning.md) on tuning hyperparameters for better performance. | |||
- For docker and Kubernetes runs, you need to set up shared memory which is used for communication between processes. See `--shm-size` for docker and `/dev/shm` size update for Kubernetes manifests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good summary. but it seem this doc server_arguments.md
is majorly for CLI (python run) use case.
if we aim for "docker"/"k8s", maybe this file is not suitable?
https://github.com/sgl-project/sglang/blob/main/docs/start/install.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, server_arguments.md is the only place where it summarize the parallelism runs, so I believe it should be mentioned thare (maybe among other places?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, as you wrote, in Docker part of https://github.com/sgl-project/sglang/blob/main/docs/start/install.md shared memory is already mentioned. And in Kubernetes section there's just a link to manifest, so I'm suggesting:
- to modify single node manifest
- to update the docs in the place where
--tp
is actually mentioned as server argument
cc @whybeyoung could you take a look? thanks! |
865d476
to
3852603
Compare
@whybeyoung is it approved by you? if so, pass the CI . I can merge it |
I will remind @whybeyoung again. |
/approve |
/lgtm |
da6e5ec
to
546144e
Compare
Added extended shared memory for default kubernetes manigest.
I don't expect I am supposed to merge the PR, right? I keep rebasing it because it's offent out-of-sync, but than the approval status is lost... |
no need to rebase. We will merge this |
Hey, so what do you think about this? It's quite minor change, is there anything wrong with it? |
Added extended shared memory for default kubernetes manigest. Also, point various cache directories to writable /tmp so that pod does not need to be run as root.
Motivation
Added some more clarification for the usage of shared memory increase. It's necessary for single node installation too, if one wants to use some parallelism techniques. See #4259
Also, k8s pod does not need to run with root security context, but it needs some tinkering.Modifications
- additionally added env variables to enforce creating caches in writable volume, so that the pod can be run as non-rootChecklist