-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: sat.model.transformer.BaseTransformer() got multiple values for keyword argument 'parallel_output' #179
Comments
Hey, guy. I have the same question. Did you find out how to deal with it? |
Update code to the latest main branch. As you can see, the |
你好,我使用了SwissArmyTransformer 0.4.12,visualglm的代码也是这两天git clone的,但是是遇到了TypeError: type object got multiple values for keyword argument 'parallel_output'.。 |
感谢。但是我想用qlora微调时候,还是遇到相同的问题,似乎还是有问题 |
这是我的 class FineTuneVisualGLMModel(VisualGLMModel):
|
改了,再试试 |
我意思是我也改了: |
你先pull一下新的代码试试,因为你没改完全。 |
好吧,我pull下 |
更新了visualglm-6b的代码和重新git clone SwissArmyTransformer最新版。但是还遇到问题,这个问题是啥意思呢? 我吧sat的权重放在satckpt文件夹了,这是finetune_visualglm.py的代码:但是这个应该不是产生dtype的问题根本?
|
更新了一下sat,你再试试呢? |
再一次试了,之前的问题都解决了。但是遇到了另外一个问题:No backend type associated with device type cpu 。这是我的CPU内存不够吗,64G的内存。加载起来应该是够的,或许GPU不够?我看readme是说q-lora是显存10G也够的,我的单个GPU显存为11G。 |
我感觉是你的机器没有显卡: 可以看到这里的代码,如果cuda available,模型就会在cuda上,而不是在cpu上。你可以在这行代码if里加一个断点,确认是否运行了.cuda()。 |
是有的,有单机8张卡。只想用一张。在第一行加载权重之后,我在if torch.cuda.is_available() 里面加了print("111111111111111111111111111111")。结果打印了出来。而且我昨天还用了GPU跑其他程序,都可以正常调用。我使用few-shot的官方例子数据进行微调,但是目前是会发生这个问题。 [2024-08-23 14:46:54,366] [INFO] [RANK 0] > successfully loaded satckpt/1/mp_rank_00_model_states.pt RuntimeError: No backend type associated with device type cpu ------>相同问题------------------------------------------- [2024-08-23 14:47:47,702] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 144840 [2024-08-23 14:47:47,975] [ERROR] [launch.py:325:sigkill_handler] ['/home/yaoyunze/anaconda3/envs/visualglm/bin/python', '-u', 'finetune_visualglm.py', '--local_rank=0', '--experiment-name', 'finetune-visualglm-6b', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '300', '--resume-dataloader', '--max_source_length', '64', '--max_target_length', '256', '--lora_rank', '10', '--layer_range', '0', '14', '--pre_seq_len', '4', '--train-data', './fewshot-data/dataset.json', '--valid-data', './fewshot-data/dataset.json', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--save-interval', '300', '--eval-interval', '10000', '--save', './checkpoints', '--split', '1', '--eval-iters', '10', '--eval-batch-size', '1', '--zero-stage', '1', '--lr', '0.0001', '--batch-size', '1', '--gradient-accumulation-steps', '4', '--skip-init', '--fp16', '--use_qlora'] exits with return code = 1 代码显示的问题在这里。 |
好了,能训练起来了,显存只用到9GB,非常感谢。 |
好吧,qlora微调完之后,推理阶段出现维度不匹配的问题。看过之前的issue,但是也没有说出现这种情况....... [2024-08-23 17:10:35,890] [INFO] [RANK 0] replacing layer 0 attention with lora |
试一下改成这样: model, model_args = AutoModel.from_pretrained(
args.from_pretrained,
args=argparse.Namespace(
fp16=True,
skip_init=True,
use_gpu_initialization=True if (torch.cuda.is_available() and args.quant is None) else False,
device='cuda' if (torch.cuda.is_available() and args.quant is None) else 'cpu',
), build_only=True)
from sat.training.model_io import load_checkpoint
load_checkpoint(model, model_args, args.from_pretrained) |
`def main():
这样吗? |
是的 |
不太行,还是一样的问题。 |
看到之前有人提问,我试一试这个库。我现在的bitsandbytes是0.43.3版本 |
这样: model, model_args = AutoModel.from_pretrained(
args.from_pretrained,
args=argparse.Namespace(
fp16=True,
skip_init=True,
use_gpu_initialization=False,
device='cpu',
), build_only=True)
model = model.cuda()
from sat.training.model_io import load_checkpoint
load_checkpoint(model, model_args, args.from_pretrained) |
佬,晚上八点前还在吗?这个问题我现在还在试着,但是加载较慢,估计时间很久。但是我怕解决不了 |
这里用着比较复杂主要是因为bitsandbytes把做量化这件事放到了.to('cuda')函数里。训练的时候加载的是没有量化的权重,因此需要构造模型->加载权重->.to('cuda'),但是训练存下来的是量化以后的权重,所以inference需要构造模型->.to('cuda')->加载权重。 |
嗯嗯,我再试试。 |
又出错了..........是bitsandbytes版本问题? |
修好了,重新pull一下最新的sat。然后cli_demo用这个: # load model
model, model_args = AutoModel.from_pretrained(
args.from_pretrained,
args=argparse.Namespace(
fp16=True,
skip_init=True,
use_gpu_initialization=False,
device='cpu',
), build_only=True)
model = model.cuda()
from sat.training.model_io import load_checkpoint
load_checkpoint(model, model_args, args.from_pretrained)
model = model.eval() 但是需要重新训练,因为save模型的逻辑也改了。 |
好滴,我重新训练下 |
不知道是不是训练的问题,预测的结果似乎不对劲。这是我的cli_demo.py model, model_args = AutoModel.from_pretrained(
args.from_pretrained,
args=argparse.Namespace(
fp16=True,
skip_init=True,
use_gpu_initialization=False,
device='cpu',
), build_only=True)
model = model.cuda()
from sat.training.model_io import load_checkpoint
load_checkpoint(model, model_args, args.from_pretrained)
model = model.eval()
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
tokenizer = AutoTokenizer.from_pretrained("/home/data/yaoyunze/visualglm2/VisualGLM-6B/chatckpt", trust_remote_code=True) #本地路径
if not args.english:
print('欢迎使用 VisualGLM-6B 模型,输入图像URL或本地路径读图,继续输入内容对话,clear 重新开始,stop 终止程序')
else:
print('Welcome to VisualGLM-6B model. Enter an image URL or local file path to load an image. Continue inputting text to engage in a conversation. Type "clear" to start over, or "stop" to end the program.')
with torch.no_grad():
while True:
history = None
cache_image = None
if not args.english:
image_path = input("请输入图像路径或URL(回车进入纯文本对话): ")
else:
image_path = input("Please enter the image path or URL (press Enter for plain text conversation): ")
if image_path == 'stop':
break
if len(image_path) > 0:
query = args.prompt_en if args.english else args.prompt_zh
else:
if not args.english:
query = input("用户:")
else:
query = input("User: ")
while True:
if query == "clear":
break
if query == "stop":
sys.exit(0)
try:
response, history, cache_image = chat(
image_path,
model,
tokenizer,
query,
history=history,
image=cache_image,
max_length=args.max_length,
top_p=args.top_p,
temperature=args.temperature,
top_k=args.top_k,
english=args.english,
invalid_slices=[slice(63823, 130000)] if args.english else []
)
except Exception as e:
print(e)
break
sep = 'A:' if args.english else '答:'
print("VisualGLM-6B:"+response.split(sep)[-1].strip())
image_path = None
if not args.english:
query = input("用户:")
else:
query = input("User: ") 结果: 请输入图像路径或URL(回车进入纯文本对话): fewshot-data/2p.png
VisualGLM-6B:男女走在一起,相互依靠.
用户:clear
请输入图像路径或URL(回车进入纯文本对话): fewshot-data/2p.png
VisualGLM-6B:男女走在下雨的街道上,(label 是:这张图片的背景是蒙蒙细雨。)
用户:clear
请输入图像路径或URL(回车进入纯文本对话): fewshot-data/ghost.jpg
VisualGLM-6B:这张图片的背景是一张桌子,桌子上有棋盘。(label 是:这张图片的背景是一个房间)
用户: |
prompt提示是:--prompt_zh 这张图片的背景里有什么内容? |
大佬,打扰了,这个问题在我的环境下也复现了,我sat和virtualglm用的都是最新的代码,报错如下,可以帮忙看下嘛。 ===================================BUG REPORT=================================== python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issuesbin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so |
solved ,把sat版本降到3.9就可以了pip install SwissArmyTransformer==0.3.6 |
加载visualglm模型的时候报错:
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
Traceback (most recent call last):
File "/root/TransGPT/multi_modal/hf_infer.py", line 3, in
model = AutoModel.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True).half().cuda()
File "/root/.conda/envs/demo/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/root/.conda/envs/demo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2966, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/THUDM/visualglm-6b/f4f759acde0926fefcd35e2c626e08adb452eff8/modeling_chatglm.py", line 1345, in init
self.image_encoder = BLIP2(config.eva_config, config.qformer_config)
File "/root/.cache/huggingface/modules/transformers_modules/THUDM/visualglm-6b/f4f759acde0926fefcd35e2c626e08adb452eff8/visual.py", line 59, in init
self.vit = EVAViT(EVAViT.get_args(**eva_args))
File "/root/.cache/huggingface/modules/transformers_modules/THUDM/visualglm-6b/f4f759acde0926fefcd35e2c626e08adb452eff8/visual.py", line 20, in init
super().init(args, transformer=transformer, parallel_output=parallel_output, **kwargs)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/sat/model/official/vit_model.py", line 111, in init
super().init(args, transformer=transformer, **kwargs)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/sat/model/base_model.py", line 93, in init
self.transformer = BaseTransformer(
TypeError: sat.model.transformer.BaseTransformer() got multiple values for keyword argument 'parallel_output'
The text was updated successfully, but these errors were encountered: