Cannot quantize to int8 - torch TypeError #11

AlpinDale · 2024-02-03T21:09:19Z

I'm trying to quantize Llama2 7b using the instructions in the readme, but get this:

start trans into int8, this might take a while
Instantiating Int8LlamaAttention without passing `layer_idx` is not recommended and will to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` when creating this class.
Traceback (most recent call last):
  File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/examples/smoothquant_model.py", line 117, in <module>
    main()
  File "/home/anon/micromamba/envs/testing/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/examples/smoothquant_model.py", line 112, in main
    int8_model = quant_model_class.from_float(model, decoder_layer_scales, quant_config)
  File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/models/llama.py", line 245, in from_float
    int8_module.model = Int8LlamaModel.from_float(
  File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/models/llama.py", line 216, in from_float
    int8_module.layers[i] = Int8LlamaDecoderLayer.from_float(
  File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/models/llama.py", line 174, in from_float
    int8_module.input_layernorm = Int8LlamaRMSNorm.from_float(
  File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/models/llama.py", line 27, in from_float
    int8_module.weight = module.weight / output_scale
  File "/home/anon/micromamba/envs/testing/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1708, in __setattr__
    raise TypeError(f"cannot assign '{torch.typename(value)}' as parameter '{name}' "
TypeError: cannot assign 'torch.cuda.HalfTensor' as parameter 'weight' (torch.nn.Parameter or None expected)

The scales generate correctly.

The text was updated successfully, but these errors were encountered:

AniZpZ · 2024-02-04T02:35:11Z

It might be caused by versions differences of depencies.
You can try the following code in line 27, it probably can solve the problem.

int8_module.weight = torch.nn.Parameter(module.weight / output_scale)

We will work on this problem and fix it.

Hongbosherlock · 2024-02-04T06:42:48Z

It might be caused by versions differences of depencies. You can try the following code in line 27, it probably can solve the problem.
int8_module.weight = torch.nn.Parameter(module.weight / output_scale)
We will work on this problem and fix it.

What are the required versions of PyTorch, CUDA, and Transformers that we need?

… <merge-MERGE #PR-11 ~fix eval model bugs >

AniZpZ pushed a commit that referenced this issue Mar 21, 2024

Merge pull request #11 in wm_ai/autosmoothquant from zy-dev to main -…

416ee32

… <merge-MERGE #PR-11 ~fix eval model bugs >

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot quantize to int8 - torch TypeError #11

Cannot quantize to int8 - torch TypeError #11

AlpinDale commented Feb 3, 2024

AniZpZ commented Feb 4, 2024

Hongbosherlock commented Feb 4, 2024 •

edited

Loading

Cannot quantize to int8 - torch TypeError #11

Cannot quantize to int8 - torch TypeError #11

Comments

AlpinDale commented Feb 3, 2024

AniZpZ commented Feb 4, 2024

Hongbosherlock commented Feb 4, 2024 • edited Loading

Hongbosherlock commented Feb 4, 2024 •

edited

Loading