Support for Hopper H100 #7

rosario-purple · 2024-01-22T22:56:28Z

Hi! You've probably already considered this, but would you be able to add support for Hopper H100 GPUs? A100s don't have nearly as much memory bandwidth. Am happy to run tests/benchmarks on one if that would help, thanks

Ageliss · 2024-01-23T03:38:45Z

I had a bench test on H800, maybe a little bit slower than H100. Hope it could help.

Ageliss · 2024-01-23T03:42:32Z

Also, I had another question that how marlin performs comparing with TRT-LLM :
device void weight_only_batched_gemv()
https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/kernels/weightOnlyBatchedGemv/kernel.h#L296

Recently, a NIPS paper called Quip also shared a version of W2~W4.GEMM, it seems marlin and Quip both use a similar mma but very different with TRT-LLM.
Quip decompress:
https://github.com/Cornell-RelaxML/quip-sharp/blob/cd1949525722fa9b201af7a8c96841cbbd046b4c/quiptools/quiptools_e8p_gemv.cu

Any comments on the difference and performance?

Qubitium · 2024-03-29T04:56:46Z

@Ageliss Can you confirm the benchmark result you posted of llama 7B and 65B is on H800 with Marlin kernel? Thank you. Can you also run the marlin kernel bench in bench.py and test.py on H800? Thank you! don't have H100 but would like to test/validate H100/H800 for autogptq library.

Qubitium mentioned this issue Mar 29, 2024

Hopper is not validated for Marlin AutoGPTQ/AutoGPTQ#624

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Hopper H100 #7

Support for Hopper H100 #7

rosario-purple commented Jan 22, 2024

Ageliss commented Jan 23, 2024

Ageliss commented Jan 23, 2024

Qubitium commented Mar 29, 2024 •

edited

Loading

Support for Hopper H100 #7

Support for Hopper H100 #7

Comments

rosario-purple commented Jan 22, 2024

Ageliss commented Jan 23, 2024

Ageliss commented Jan 23, 2024

Qubitium commented Mar 29, 2024 • edited Loading

Qubitium commented Mar 29, 2024 •

edited

Loading