Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h200 tuning fused_moe_triton config for Mixtral 8x7B/8x22B and Qwen2 57BA14B #2689

Merged
merged 2 commits into from
Dec 31, 2024

Conversation

BBuf
Copy link
Collaborator

@BBuf BBuf commented Dec 31, 2024

related issue: #2471

add h200 tuning fused_moe_triton kernel config for mixtral 8x7b/8x22b and qwen2 57ba14b.

  • For mixtral 8x7b, H200 can serving with tp1/tp2/tp4/tp8 in BF16 and FP8.
  • For Miaxtral 8x22b, H200 can serving with tp4/tp8 in BF16, and tp2/tp4/tp8 in FP8.
  • For Qwen257BA14B, H200 can serving with tp1/tp2/tp4/tp8 in BF16 and FP8.

In total, there are 8+2+3+8=21 configs.

@zhyncs zhyncs merged commit 286cad3 into sgl-project:main Dec 31, 2024
15 checks passed
XiaotongJiang pushed a commit to XiaotongJiang/sglang that referenced this pull request Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants