Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Qwen2 MoE support #603

Merged
merged 2 commits into from
Jul 9, 2024
Merged

Add Qwen2 MoE support #603

merged 2 commits into from
Jul 9, 2024

Conversation

M0gician
Copy link
Contributor

@M0gician M0gician commented Jul 8, 2024

Blocked by #598

Add Qwen2 MoE support to SGLang. Implementation is adopted from the current vLLM implementation.
I've tested this patch on my local environment with 8 * A800, and everything works gracefully.

The configurations I've tested so far are

  • CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m sglang.launch_server --model-path ~/Qwen2-57B-A14B-Instruct/ --port 30000 --mem-fraction-static 0.9 --tp-size 8
  • CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m sglang.launch_server --model-path ~/Qwen2-57B-A14B-Instruct/ --port 30000 --mem-fraction-static 0.9 --tp-size 4 --dp-size 2
  • CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m sglang.launch_server --model-path ~/Qwen2-57B-A14B-Instruct/ --port 30000 --mem-fraction-static 0.9 --tp-size 2 --dp-size 4

@merrymercy merrymercy merged commit 740c46a into sgl-project:main Jul 9, 2024
@merrymercy
Copy link
Contributor

@M0gician It is merged. Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants