[Feature] Integrate CUTLASS FP8 GEMM into sgl-kernel #2472

zhyncs · 2024-12-12T20:08:31Z

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

No response

zhyncs · 2024-12-12T20:15:53Z

note: If the official merge does not occur by the end of the month, we will compile and use based on the sgl-project/cutlass f8_blockwise_scaling_pr_branch branch.
ref https://github.com/sgl-project/cutlass/tree/f8_blockwise_scaling_pr_branch

HaiShaw · 2024-12-12T23:38:49Z

@zhyncs this suppose to be NV specific.

zhyncs · 2024-12-13T07:30:55Z

@HaiShaw Yeah we will use it for some model on NVIDIA H100 and H200

zhyncs added high priority performance quant LLM Quantization labels Dec 12, 2024

zhyncs assigned HandH1998 and zhyncs Dec 12, 2024

HaiShaw self-assigned this Dec 12, 2024

zhyncs changed the title ~~[Feature] Integrate FP8 GEMM into sgl-kernel~~ [Feature] Integrate CUTLASS FP8 GEMM into sgl-kernel Dec 13, 2024

Provide feedback