Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turing support #4

Open
Dampfinchen opened this issue Jan 20, 2024 · 1 comment
Open

Turing support #4

Dampfinchen opened this issue Jan 20, 2024 · 1 comment

Comments

@Dampfinchen
Copy link

Dampfinchen commented Jan 20, 2024

Why is Ampere or Ada (RTX 3000 and RTX 4000 series) required to support this?

Turing (RTX 2000 series) has INT4 tensor cores.

@efrantar
Copy link
Member

Hi, Marlin does not use any INT4 tensor cores, 4-bit weights are decompressed on-the-fly and then the actual computation is carried out in FP16. The reason Turning is not support is that Marlin heavily relies on the cp.async instruction which was introduced with compute capability 8.0; this allows explicitly fetching global memory in the background while doing other work at the same time, which is crucial to reach peak performance in an FP16xINT4 setting. While you could probably reuse quite some work of Marlin for writing a Turing kernel, some significant changes will likely be necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants