- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2
Open
Labels
AscendSomething isn't workingSomething isn't workingpublishNew feature or requestNew feature or request
Description
作者:@Yikun
0. 前置条件
根据 #7 完成pytorch环境搭建
(.llm-venv) # npu-smi info
(.llm-venv) # python3 -c "import torch;import torch_npu; a = torch.randn(3, 4).npu(); print(a + a);"
Warning: Device do not support double dtype now, dtype cast repalce with float.
tensor([[ 1.2800,  1.3105,  0.4513, -1.1650],
        [ 3.5199, -0.2590,  2.6664, -1.9602],
        [ 2.3262, -2.4671,  2.3252, -2.1502]], device='npu:0')1. 安装Transformer
python3 -m pip install --upgrade pip
pip install transformers accelerate xformers
# Need "sentencepiece" and "protobuf==3.20.0" when convert_llama_weights_to_hf
pip install sentencepiece protobuf==3.20.0
2. 准备llama模型
准备模型:
# tree llama/llama-2-7b/
llama/llama-2-7b/
├── checklist.chk
├── consolidated.00.pth
└── params.json
cd llama/llama-2-7b
mkdir 7B
mv *.* 7B
cp ../tokenizer.model .
# tree -h llama/llama-2-7b/
llama/llama-2-7b/
|-- [4.0K]  7B
|   |-- [ 100]  checklist.chk
|   |-- [ 13G]  consolidated.00.pth
|   `-- [ 102]  params.json
`-- [488K]  tokenizer.model
转换模型:
# find / -name convert_llama_weights_to_hf.py
/root/.llm-venv/lib/python3.8/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py
python  /root/.llm-venv/lib/python3.8/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir llama/llama-2-7b --model_size 7B --output_dir transformer/llama-2-7b
生成的模型结构如下:
# tree -h transformer/llama-2-7b/
transformer/llama-2-7b/
|-- [ 578]  config.json
|-- [ 132]  generation_config.json
|-- [9.3G]  pytorch_model-00001-of-00002.bin
|-- [3.3G]  pytorch_model-00002-of-00002.bin
|-- [ 26K]  pytorch_model.bin.index.json
|-- [ 411]  special_tokens_map.json
|-- [1.8M]  tokenizer.json
|-- [488K]  tokenizer.model
`-- [ 745]  tokenizer_config.json
3. 运行模型
from transformers import AutoTokenizer, LlamaForCausalLM
import torch
import torch_npu
# Avoid ReduceProd operator core dump, see more in: https://github.com/cosdt/llm/issues/4
option={}
option["NPU_FUZZY_COMPILE_BLACKLIST"]="ReduceProd"
torch.npu.set_option(option)
npu_id = 0
torch.npu.set_device(0)
device = "npu:{}".format(npu_id)
model_path = "/opt/yikun/transformer/llama-2-7b"
model = LlamaForCausalLM.from_pretrained(model_path).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_path)
prompt = "Deep learning is"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
generate_ids = model.generate(inputs.input_ids, max_length=50)
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
'Deep learning is a branch of machine learning that is based on artificial neural networks. Deep learning is a subset of machine learning that is based on artificial neural networks. Neural networks are a type of machine learning algorithm that is inspired by the structure and'
踩到的坑:
- torch.npu.set_device: 设置错NPU ID后,会一直报错,即使改回来也会报错: https://github.com/cosdt/llm/issues/3
- torch ReduceProd算子问题:https://github.com/cosdt/llm/issues/4
- import transformer必须先于torch和torch_npu: https://github.com/cosdt/llm/issues/5
Metadata
Metadata
Assignees
Labels
AscendSomething isn't workingSomething isn't workingpublishNew feature or requestNew feature or request