Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can this support lower bit quant? #11

Open
vince62s opened this issue Jan 30, 2024 · 3 comments
Open

can this support lower bit quant? #11

vince62s opened this issue Jan 30, 2024 · 3 comments

Comments

@vince62s
Copy link

3-bit ?
2-bit ?

@ChenMnZ
Copy link

ChenMnZ commented Feb 19, 2024

I am also curious about this.

@efrantar
Copy link
Member

Hi,

currently Marlin supports only a limited set of quantization options (4bit + groupsize 128), selected for a good accuracy/speed trade-off, but therefore at very close to peak efficiency in many cases, including larger batchsizes.

That being said, Marlin can definitively be a good starting point for developing highly efficient kernels for other bitwidths or quantization schemes.

@nivibilla
Copy link

How can one go about making it work for 8bit gptq?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants