[QST] Why does awq write its own int3/int4 GEMM kernels instead of using CUTLASS #235

SimpleTheoryOfTypes · 2024-11-11T23:34:15Z

Just curious about AWQ's decision to implement its own custom int3/int4 GEMM kernels rather than using NVIDIA's CUTLASS library (such as using int4 CUTLASS GEMMs developed by FasterTransformer). Was this choice driven by performance limitations with CUTLASS, or did they perhaps encounter compatibility issues? Thanks in advance for the insights!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Why does awq write its own int3/int4 GEMM kernels instead of using CUTLASS #235

[QST] Why does awq write its own int3/int4 GEMM kernels instead of using CUTLASS #235

SimpleTheoryOfTypes commented Nov 11, 2024 •

edited

Loading

[QST] Why does awq write its own int3/int4 GEMM kernels instead of using CUTLASS #235

[QST] Why does awq write its own int3/int4 GEMM kernels instead of using CUTLASS #235

Comments

SimpleTheoryOfTypes commented Nov 11, 2024 • edited Loading

SimpleTheoryOfTypes commented Nov 11, 2024 •

edited

Loading