D+S packing in vLLM seems buggy #62

MingLin-home · 2024-02-27T05:23:23Z

Hello!

I followed D+S packing instruction and stored the packed .pt file in "~/models/${model_name}-squeezellm/packed_weight", where model_name="Llama-2-7b-chat-hf". When I load this model in vLLM:

python examples/llm_engine_example.py --dtype float16 --model ~/models/${model_name}-squeezellm/packed_weight --quantization squeezellm

vLLM complained cannot find parameters "sparse_threshold.model.layers.*". Any idea why? I repeated the quantization from scratch several times but all ended up in this error.

To get a quick fix, I manually skip the above error in vLLM model loading step in llama.py , if we cannot find the missing param. However, this time the model cannot generate meaningful output. So I believe the above parameters are indeed not loaded correctly.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

D+S packing in vLLM seems buggy #62

D+S packing in vLLM seems buggy #62

MingLin-home commented Feb 27, 2024

D+S packing in vLLM seems buggy #62

D+S packing in vLLM seems buggy #62

Comments

MingLin-home commented Feb 27, 2024