Skip to content

Mistral Nemo -> CUDA out of memory on 2 x 80GB H100 #1437

Closed Answered by merrymercy
draqos asked this question in Q&A
Discussion options

You must be logged in to vote

It turns out that there is something wrong with the model config of mistralai/Mistral-Nemo-Instruct-2407.

SGlang reads the model's context length based on max_position_embeddings in the model config. However, in this model, they set it to 1024k while this model is only trained with 128k.
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/blob/main/config.json#L13
This makes sglang incorrectly estimate the memory usage of memory pool, so we need to correct it with either --mem-fraction-static or --context-length.
For example, if your use case only needs 32k context length, you can do

python3 -m sglang.launch_server --model-path mistralai/Mistral-Nemo-Instruct-2407 --tp-size 2 --en…

Replies: 4 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by merrymercy
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants