代码除了import torch,需要和和transformers保持一致。但需要精简掉分布式、kvcache的后端配置 https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py