[FEA] Has CUTLASS considered supporting Zero-points and block-wise scaling in Hoppr Mixed Grouped Gemm recently?

For the Hopper architecture, the mixed_dtype_grouped_gemm currently only supports row-wise scaling. However, for the AWQ quantization, the precision loss is still quite significant.

<img width="1093" alt="Image" src="https://github.com/user-attachments/assets/de5aec1f-a40d-42d2-81f5-13c0fcfa2bc9" />

Will CUTLASS support the Zero-points and block-wise scaling of AWQ (W4A16 / W4A8) for MoE models?

Thanks~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Has CUTLASS considered supporting Zero-points and block-wise scaling in Hoppr Mixed Grouped Gemm recently? #2261

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Has CUTLASS considered supporting Zero-points and block-wise scaling in Hoppr Mixed Grouped Gemm recently? #2261

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions