Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Hopper H100 #7

Open
rosario-purple opened this issue Jan 22, 2024 · 3 comments
Open

Support for Hopper H100 #7

rosario-purple opened this issue Jan 22, 2024 · 3 comments

Comments

@rosario-purple
Copy link

Hi! You've probably already considered this, but would you be able to add support for Hopper H100 GPUs? A100s don't have nearly as much memory bandwidth. Am happy to run tests/benchmarks on one if that would help, thanks

@Ageliss
Copy link

Ageliss commented Jan 23, 2024

I had a bench test on H800, maybe a little bit slower than H100. Hope it could help.
image

@Ageliss
Copy link

Ageliss commented Jan 23, 2024

Also, I had another question that how marlin performs comparing with TRT-LLM :
device void weight_only_batched_gemv()
https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/kernels/weightOnlyBatchedGemv/kernel.h#L296

Recently, a NIPS paper called Quip also shared a version of W2~W4.GEMM, it seems marlin and Quip both use a similar mma but very different with TRT-LLM.
Quip decompress:
https://github.com/Cornell-RelaxML/quip-sharp/blob/cd1949525722fa9b201af7a8c96841cbbd046b4c/quiptools/quiptools_e8p_gemv.cu

Any comments on the difference and performance?

@Qubitium
Copy link

Qubitium commented Mar 29, 2024

@Ageliss Can you confirm the benchmark result you posted of llama 7B and 65B is on H800 with Marlin kernel? Thank you. Can you also run the marlin kernel bench in bench.py and test.py on H800? Thank you! don't have H100 but would like to test/validate H100/H800 for autogptq library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants