-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gguf support for bloom #33473
Add gguf support for bloom #33473
Conversation
6f3e643
to
c23788b
Compare
Hi @SunMarc @LysandreJik @ArthurZucker! This PR is ready for review. There is one thing that looks odd to me. After dequantization and loading the model, It generates a wrong sequence, not as expected when using a normal pretrained model. Instead of |
Since the model was quantized, THis is normal that it is not behaving the same of the normal pretrained model. Dequantization doesn't recover the precision of the original model. Could you check that it behaves similarly as the original model that was converted to gguf in fp16 precision or full precision ? This way we have a way to compare the model loaded from gguf file. |
c23788b
to
d49beb4
Compare
hi @SunMarc. Still odd behaviour, it does not matter which format I take. So, i debugged model_state loading and parsing logic several times and tried to compare it with working one (like llama). Everything looks good from this perspective and it seems that weights, params, config and so on are loaded correctly into the model. I'am just worring about quantized data itself. |
d49beb4
to
1a61d07
Compare
Hi @SunMarc! I finally found the issue and the reason of bloom model strange behaviour. This is because of https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py (L972-998), this algorithm reshapes qkv data and that's why we got completely different weights in the end. I implemented reverse reshaping algorithm in modeling_gguf_pytorch_utils.py to place them back and after that, everything is ok, and model outputs expected values. Please, take a look on my changes, now, everything should be correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job finding the issue ! Could you add a final test to check that the fp16 from transformers and the fp16 model from gguf share the same weights ? That would be a nice test to check that the conversion was done correctly ! Can you also check that the quantized model doesn't return gibberish ?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Yes, sure, I added one more test to compare weights. But what about gibberish? Should I implement additional test to check particular generation or? |
Just add a simple generation check for a q4 quants for example, just like what was done for other models ! |
But this kind of test is already added, it is called test_bloom_f16. Or is it not enough? |
But we need one for a quantized model since most users uses these models. With fp16, we can't be sure that the dequantize step was done correctly. |
173b862
to
e69b87b
Compare
Hello! Ok, got it, I just added q8_0 test as well. Would it be sufficient? |
e69b87b
to
894d1a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you @VladOS95-cyber!
* add bloom arch support for gguf * apply format * small refactoring, bug fix in GGUF_TENSOR_MAPPING naming * optimize bloom GGUF_TENSOR_MAPPING * implement reverse reshaping for bloom gguf * add qkv weights test * add q_8 test for bloom
* add bloom arch support for gguf * apply format * small refactoring, bug fix in GGUF_TENSOR_MAPPING naming * optimize bloom GGUF_TENSOR_MAPPING * implement reverse reshaping for bloom gguf * add qkv weights test * add q_8 test for bloom
What does this PR do?
Add Bloom GGUF loading support
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case. Link: Community contribution: Adding GGUF support for more architectures #33260
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Regarding the task @SunMarc @LysandreJik @ArthurZucker .