Skip to content

容器昇腾NPU跑通llama2-7B #8

Open
@Yikun

Description

@Yikun

作者:@Yikun

0. 前置条件

根据 #7 完成pytorch环境搭建

(.llm-venv) # npu-smi info

(.llm-venv) # python3 -c "import torch;import torch_npu; a = torch.randn(3, 4).npu(); print(a + a);"
Warning: Device do not support double dtype now, dtype cast repalce with float.
tensor([[ 1.2800,  1.3105,  0.4513, -1.1650],
        [ 3.5199, -0.2590,  2.6664, -1.9602],
        [ 2.3262, -2.4671,  2.3252, -2.1502]], device='npu:0')

1. 安装Transformer

python3 -m pip install --upgrade pip
pip install transformers accelerate xformers
# Need "sentencepiece" and "protobuf==3.20.0" when convert_llama_weights_to_hf
pip install sentencepiece protobuf==3.20.0

2. 准备llama模型

准备模型:

# tree llama/llama-2-7b/
llama/llama-2-7b/
├── checklist.chk
├── consolidated.00.pth
└── params.json

cd llama/llama-2-7b
mkdir 7B
mv *.* 7B
cp ../tokenizer.model .

# tree -h llama/llama-2-7b/
llama/llama-2-7b/
|-- [4.0K]  7B
|   |-- [ 100]  checklist.chk
|   |-- [ 13G]  consolidated.00.pth
|   `-- [ 102]  params.json
`-- [488K]  tokenizer.model

转换模型:

# find / -name convert_llama_weights_to_hf.py
/root/.llm-venv/lib/python3.8/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py
python  /root/.llm-venv/lib/python3.8/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir llama/llama-2-7b --model_size 7B --output_dir transformer/llama-2-7b

生成的模型结构如下:

# tree -h transformer/llama-2-7b/
transformer/llama-2-7b/
|-- [ 578]  config.json
|-- [ 132]  generation_config.json
|-- [9.3G]  pytorch_model-00001-of-00002.bin
|-- [3.3G]  pytorch_model-00002-of-00002.bin
|-- [ 26K]  pytorch_model.bin.index.json
|-- [ 411]  special_tokens_map.json
|-- [1.8M]  tokenizer.json
|-- [488K]  tokenizer.model
`-- [ 745]  tokenizer_config.json

3. 运行模型

from transformers import AutoTokenizer, LlamaForCausalLM
import torch
import torch_npu

# Avoid ReduceProd operator core dump, see more in: https://github.com/cosdt/llm/issues/4
option={}
option["NPU_FUZZY_COMPILE_BLACKLIST"]="ReduceProd"
torch.npu.set_option(option)

npu_id = 0
torch.npu.set_device(0)

device = "npu:{}".format(npu_id)
model_path = "/opt/yikun/transformer/llama-2-7b"
model = LlamaForCausalLM.from_pretrained(model_path).to(device)

tokenizer = AutoTokenizer.from_pretrained(model_path)

prompt = "Deep learning is"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
generate_ids = model.generate(inputs.input_ids, max_length=50)

tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
'Deep learning is a branch of machine learning that is based on artificial neural networks. Deep learning is a subset of machine learning that is based on artificial neural networks. Neural networks are a type of machine learning algorithm that is inspired by the structure and'

踩到的坑:

  1. torch.npu.set_device: 设置错NPU ID后,会一直报错,即使改回来也会报错: https://github.com/cosdt/llm/issues/3
  2. torch ReduceProd算子问题:https://github.com/cosdt/llm/issues/4
  3. import transformer必须先于torch和torch_npu: https://github.com/cosdt/llm/issues/5

Metadata

Metadata

Assignees

No one assigned

    Labels

    AscendSomething isn't workingpublishNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions