Error when trying to run Llama2-7b: attention_mask and position_ids are None #60

Philippe-Guyard · 2024-06-26T07:05:48Z

From what I see in the Llama2 code on hugging face, the attention_mask and position_ids variables are never set by the model. This results in cache['attention_mask'] and cache['position_ids'] being None and the script failing on lib/prune.py line 144:

    if f"model.layers.{i}" in model.hf_device_map:   ## handle the case for llama-30B and llama-65B, when the device map has multiple GPUs;
            dev = model.hf_device_map[f"model.layers.{i}"]
            inps, outs, attention_mask, position_ids = inps.to(dev), outs.to(dev), attention_mask.to(dev), position_ids.to(dev)

Please note that I do not have access to GPUs with more than 40GB VRAM, and the 7B model does not fit in 40GB for me, so I have to use a device map for the 7B model, which leads to the following error.

The text was updated successfully, but these errors were encountered:

Logan-007L · 2024-08-29T11:57:45Z

hello, Have you solved it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when trying to run Llama2-7b: attention_mask and position_ids are None #60

Error when trying to run Llama2-7b: attention_mask and position_ids are None #60

Philippe-Guyard commented Jun 26, 2024

Logan-007L commented Aug 29, 2024

Error when trying to run Llama2-7b: attention_mask and position_ids are None #60

Error when trying to run Llama2-7b: attention_mask and position_ids are None #60

Comments

Philippe-Guyard commented Jun 26, 2024

Logan-007L commented Aug 29, 2024