Bitsandbytes models stopped working in the new 0.7.1 version #12849
Closed
davefojtik
announced in
Q&A
Replies: 1 comment
-
Solved. The problem was installing an outdated Flashinfer in my container image. Version |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In 0.7.0 everything works fine. But when I update to 0.7.1 with the same models, code etc. it's broken.
The logs spam a lot of
MLA is not supported with bitsandbytes quantization. Disabling MLA.
, even though theVLLM_MLA_DISABLE
env variable is set to true.Then after the
Capturing cudagraphs
part, the vllm returns the following error:Every update I go through the engine arguments and enviroment variables in the documentation to see if there's something new or changed, but this time I didn't see anything that could cause this. Did I miss some major change?
Here's the full log (we're using custom VLLM on RunPod serverless):
Beta Was this translation helpful? Give feedback.
All reactions