Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: sat.model.transformer.BaseTransformer() got multiple values for keyword argument 'parallel_output' #179

Open
deep-practice opened this issue Jul 26, 2024 · 35 comments

Comments

@deep-practice
Copy link

加载visualglm模型的时候报错:
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
Traceback (most recent call last):
File "/root/TransGPT/multi_modal/hf_infer.py", line 3, in
model = AutoModel.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True).half().cuda()
File "/root/.conda/envs/demo/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/root/.conda/envs/demo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2966, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/THUDM/visualglm-6b/f4f759acde0926fefcd35e2c626e08adb452eff8/modeling_chatglm.py", line 1345, in init
self.image_encoder = BLIP2(config.eva_config, config.qformer_config)
File "/root/.cache/huggingface/modules/transformers_modules/THUDM/visualglm-6b/f4f759acde0926fefcd35e2c626e08adb452eff8/visual.py", line 59, in init
self.vit = EVAViT(EVAViT.get_args(**eva_args))
File "/root/.cache/huggingface/modules/transformers_modules/THUDM/visualglm-6b/f4f759acde0926fefcd35e2c626e08adb452eff8/visual.py", line 20, in init
super().init(args, transformer=transformer, parallel_output=parallel_output, **kwargs)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/sat/model/official/vit_model.py", line 111, in init
super().init(args, transformer=transformer, **kwargs)
File "/root/.conda/envs/demo/lib/python3.10/site-packages/sat/model/base_model.py", line 93, in init
self.transformer = BaseTransformer(
TypeError: sat.model.transformer.BaseTransformer() got multiple values for keyword argument 'parallel_output'

@BeiZhangChen
Copy link

Hey, guy. I have the same question. Did you find out how to deal with it?

@1049451037
Copy link
Member

Update code to the latest main branch. As you can see, the parallel_output argument has been deleted in VisualGLM:

https://github.com/THUDM/VisualGLM-6B/blob/7a277433740276d7abc2a71646050c03062ea9e4/model/visualglm.py#L30-L31

@corkiyao
Copy link

Update code to the latest main branch. As you can see, the parallel_output argument has been deleted in VisualGLM:

https://github.com/THUDM/VisualGLM-6B/blob/7a277433740276d7abc2a71646050c03062ea9e4/model/visualglm.py#L30-L31

你好,我使用了SwissArmyTransformer 0.4.12,visualglm的代码也是这两天git clone的,但是是遇到了TypeError: type object got multiple values for keyword argument 'parallel_output'.。
报错信息如下:(倒数第二行)
[2024-08-23 10:19:12,971] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[2024-08-23 10:19:14,750] [INFO] [launch.py:139:main] 0 NCCL_IB_DISABLE=0
[2024-08-23 10:19:14,750] [INFO] [launch.py:139:main] 0 NCCL_DEBUG=info
[2024-08-23 10:19:14,750] [INFO] [launch.py:139:main] 0 NCCL_NET_GDR_LEVEL=2
[2024-08-23 10:19:14,750] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2024-08-23 10:19:14,750] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-08-23 10:19:14,750] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-08-23 10:19:14,750] [INFO] [launch.py:164:main] dist_world_size=2
[2024-08-23 10:19:14,750] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2024-08-23 10:19:20,026] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-08-23 10:19:20,028] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-23 10:19:20,030] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2024-08-23 10:19:20,032] [INFO] [checkpointing.py:1049:_configure_using_config_file] {'partition_activations': False, 'contiguous_memory_optimization': False, 'cpu_checkpointing': False, 'number_checkpoints': None, 'synchronize_checkpoint_boundary': False, 'profile': False}
[2024-08-23 10:19:20,032] [INFO] [checkpointing.py:229:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
[2024-08-23 10:19:20,033] [INFO] [RANK 0] building FineTuneVisualGLMModel model ...
[2024-08-23 10:19:20,034] [INFO] [checkpointing.py:229:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
Traceback (most recent call last):
File "finetune_visualglm.py", line 179, in
model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args, overwrite_args={'model_parallel_size': 1})
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 222, in from_pretrained
model, model_args = cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=True, overwrite_args=overwrite_args, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 210, in from_pretrained_base
model = get_model(args, cls, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 421, in get_model
model = model_cls(args, params_dtype=params_dtype, **kwargs)
File "finetune_visualglm.py", line 13, in init
super().init(args, transformer=transformer, parallel_output=parallel_output, **kw_args)
File "/home/data/yaoyunze/visualglm2/VisualGLM-6B/model/visualglm.py", line 32, in init
super().init(args, transformer=transformer, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/official/chatglm_model.py", line 167, in init
super(ChatGLMModel, self).init(args, transformer=transformer, activation_func=gelu, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 93, in init
self.transformer = BaseTransformer(
TypeError: type object got multiple values for keyword argument 'parallel_output'
[2024-08-23 10:19:21,788] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 1845345
[2024-08-23 10:19:21,825] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 1845346

@1049451037
Copy link
Member

@corkiyao
Copy link

感谢指出,已修复:

https://github.com/THUDM/VisualGLM-6B/blob/f07e547e39a75bb51b63d2a8b955c3b8ae5a5e0d/finetune_visualglm.py#L13

感谢。但是我想用qlora微调时候,还是遇到相同的问题,似乎还是有问题
[2024-08-23 11:50:40,331] [INFO] using world size: 1 and model-parallel size: 1
[2024-08-23 11:50:40,332] [INFO] > padded vocab (size: 100) with 28 dummy tokens (new size: 128)
[2024-08-23 11:50:40,333] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-08-23 11:50:40,334] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-23 11:50:40,335] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter cpu_offload is deprecated use offload_optimizer instead
[2024-08-23 11:50:40,336] [INFO] [checkpointing.py:1049:_configure_using_config_file] {'partition_activations': False, 'contiguous_memory_optimization': False, 'cpu_checkpointing': False, 'number_checkpoints': None, 'synchronize_checkpoint_boundary': False, 'profile': False}
[2024-08-23 11:50:40,336] [INFO] [checkpointing.py:229:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
[2024-08-23 11:50:40,336] [INFO] [RANK 0] building FineTuneVisualGLMModel model ...
[2024-08-23 11:50:40,337] [INFO] [checkpointing.py:229:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
[2024-08-23 11:50:45,032] [INFO] [checkpointing.py:229:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
Traceback (most recent call last):
File "finetune_visualglm.py", line 179, in
model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args, overwrite_args={'model_parallel_size': 1})
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 221, in from_pretrained
model, model_args = cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=True, overwrite_args=overwrite_args, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 209, in from_pretrained_base
model = get_model(args, cls, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 420, in get_model
model = model_cls(args, params_dtype=params_dtype, **kwargs)
File "finetune_visualglm.py", line 13, in init
super().init(args, transformer=transformer, **kw_args)
File "/home/data/yaoyunze/visualglm2/VisualGLM-6B/model/visualglm.py", line 34, in init
self.add_mixin("eva", ImageMixin(args))
File "/home/data/yaoyunze/visualglm2/VisualGLM-6B/model/visualglm.py", line 18, in init
self.model = BLIP2(args.eva_args, args.qformer_args)
File "/home/data/yaoyunze/visualglm2/VisualGLM-6B/model/blip2.py", line 56, in init
self.vit = EVAViT(EVAViT.get_args(**eva_args))
File "/home/data/yaoyunze/visualglm2/VisualGLM-6B/model/blip2.py", line 21, in init
super().init(args, transformer=transformer, parallel_output=parallel_output, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/official/vit_model.py", line 111, in init
super().init(args, transformer=transformer, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 93, in init
self.transformer = BaseTransformer(
TypeError: type object got multiple values for keyword argument 'parallel_output'------------------------------------------报错
[2024-08-23 11:50:46,232] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 17713

@corkiyao
Copy link

这是我的

class FineTuneVisualGLMModel(VisualGLMModel):
def init(self, args, transformer=None, **kw_args):
super().init(args, transformer=transformer, **kw_args)
if args.use_ptuning:
self.add_mixin("ptuning", PTuningV2Mixin(args.num_layers, args.hidden_size // args.num_attention_heads, args.num_attention_heads, args.pre_seq_len))
if args.use_lora:
self.add_mixin("lora", LoraMixin(args.num_layers, args.lora_rank, layer_range=args.layer_range), reinit=True)
# self.get_mixin("eva").model.glm_proj = replace_linear_with_lora(self.get_mixin("eva").model.glm_proj, LoraLinear, args.lora_rank)
elif args.use_qlora:
self.add_mixin("lora", LoraMixin(args.num_layers, args.lora_rank, layer_range=args.layer_range, qlora=True), reinit=True)
self.args = args

@classmethod
def add_model_specific_args(cls, parser):
    group = parser.add_argument_group('VisualGLM-finetune', 'VisualGLM finetune Configurations')
    group.add_argument('--pre_seq_len', type=int, default=8)
    group.add_argument('--lora_rank', type=int, default=10)
    group.add_argument('--use_ptuning', action="store_true")
    group.add_argument('--use_lora', action="store_true")
    group.add_argument('--use_qlora', action="store_true")
    group.add_argument('--layer_range', nargs='+', type=int, default=None)
    return super().add_model_specific_args(parser)

@1049451037
Copy link
Member

改了,再试试

@corkiyao
Copy link

我意思是我也改了:
这是我修改后的程序:
class FineTuneVisualGLMModel(VisualGLMModel):
def init(self, args, transformer=None, **kw_args): -----------------------------------没有parallel_output了
super().init(args, transformer=transformer, **kw_args)
if args.use_ptuning:
self.add_mixin("ptuning", PTuningV2Mixin(args.num_layers, args.hidden_size // args.num_attention_heads, args.num_attention_heads, args.pre_seq_len))
if args.use_lora:
self.add_mixin("lora", LoraMixin(args.num_layers, args.lora_rank, layer_range=args.layer_range), reinit=True)
# self.get_mixin("eva").model.glm_proj = replace_linear_with_lora(self.get_mixin("eva").model.glm_proj, LoraLinear, args.lora_rank)
elif args.use_qlora:
self.add_mixin("lora", LoraMixin(args.num_layers, args.lora_rank, layer_range=args.layer_range, qlora=True), reinit=True)
self.args = args

@1049451037
Copy link
Member

你先pull一下新的代码试试,因为你没改完全。

@corkiyao
Copy link

你先pull一下新的代码试试,因为你没改完全。

好吧,我pull下

@corkiyao
Copy link

你先pull一下新的代码试试,因为你没改完全。

更新了visualglm-6b的代码和重新git clone SwissArmyTransformer最新版。但是还遇到问题,这个问题是啥意思呢?
[2024-08-23 12:56:55,745] [INFO] [RANK 0] replacing layer 0 attention with lora
Traceback (most recent call last):
File "finetune_visualglm.py", line 178, in
model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 217, in from_pretrained
return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 209, in from_pretrained_base
model = get_model(args, cls, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 420, in get_model
model = model_cls(args, params_dtype=params_dtype, **kwargs)
File "finetune_visualglm.py", line 20, in init
self.add_mixin("lora", LoraMixin(args.num_layers, args.lora_rank, layer_range=args.layer_range, qlora=True), reinit=True)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 123, in add_mixin
new_mixin.reinit(self) # also pass current mixins
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/finetune/lora2.py", line 206, in reinit
parent_model.transformer.layers[i].attention.dense = replace_linear_with_lora(parent_model.transformer.layers[i].attention.dense, 1, self.r, self.lora_alpha, self.lora_dropout, qlora=self.qlora, in_size=parent_model.transformer.hidden_size, out_size=None)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/finetune/lora2.py", line 154, in replace_linear_with_lora
new_layer = LoraLinear(original_cls, partition, in_dim, out_dim, r, *args, **kw_args, original_obj=lin)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/finetune/lora2.py", line 108, in init
self.matrix_A = HackParameterList([nn.Parameter(torch.empty((r, original_obj.weight.shape[1]), dtype=dtype)) for _ in range(partition)])
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/finetune/lora2.py", line 108, in
self.matrix_A = HackParameterList([nn.Parameter(torch.empty((r, original_obj.weight.shape[1]), dtype=dtype)) for _ in range(partition)])
NameError: free variable 'dtype' referenced before assignment in enclosing scope --------------》问题在这里
[2024-08-23 12:56:57,576] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 66497

我吧sat的权重放在satckpt文件夹了,这是finetune_visualglm.py的代码:但是这个应该不是产生dtype的问题根本?
if name == 'main':
py_parser = argparse.ArgumentParser(add_help=False)
py_parser.add_argument('--max_source_length', type=int)
py_parser.add_argument('--max_target_length', type=int)
py_parser.add_argument('--ignore_pad_token_for_loss', type=bool, default=True)
# py_parser.add_argument('--old_checkpoint', action="store_true")
py_parser.add_argument('--source_prefix', type=str, default="")
py_parser = FineTuneVisualGLMModel.add_model_specific_args(py_parser)
known, args_list = py_parser.parse_known_args()
args = get_args(args_list)
args = argparse.Namespace(**vars(args), **vars(known))
args.device = 'cpu'

model_type = 'satckpt'              --------》修改部分在这里
model, args = FineTuneVisualGLMModel.from_pretrained(model_type, args)
if torch.cuda.is_available():
    model = model.to('cuda')
tokenizer = get_tokenizer(args)
label_pad_token_id = -100 if args.ignore_pad_token_for_loss else tokenizer.pad_token_id
def data_collator(examples):
    for example in examples:
        example['input_ids'] = torch.tensor(example['input_ids'], dtype=torch.long)
        example['labels'] = torch.tensor(example['labels'], dtype=torch.long)
    ret = {
        'input_ids': torch.stack([example['input_ids'] for example in examples]),
        'labels': torch.stack([example['labels'] for example in examples]),
        'image': torch.stack([example['image'] for example in examples]),
        'pre_image': example['pre_image']
    }

@1049451037
Copy link
Member

更新了一下sat,你再试试呢?

@corkiyao
Copy link

更新了一下sat,你再试试呢?

再一次试了,之前的问题都解决了。但是遇到了另外一个问题:No backend type associated with device type cpu 。这是我的CPU内存不够吗,64G的内存。加载起来应该是够的,或许GPU不够?我看readme是说q-lora是显存10G也够的,我的单个GPU显存为11G。
amax:102429:103953 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
amax:102429:103953 [0] NCCL INFO Failed to open libibverbs.so[.1]
amax:102429:103953 [0] NCCL INFO NET/Socket : Using [0]enp129s0f0:192.168.1.25<0>
amax:102429:103953 [0] NCCL INFO Using network Socket
amax:102429:103953 [0] NCCL INFO comm 0xd74cb40 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId 4000 commId 0x729870e4af3743cd - Init START
amax:102429:103953 [0] NCCL INFO Setting affinity for GPU 0 to 0f,ff000fff
amax:102429:103953 [0] NCCL INFO Channel 00/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 01/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 02/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 03/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 04/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 05/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 06/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 07/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 08/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 09/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 10/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 11/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 12/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 13/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 14/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 15/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 16/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 17/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 18/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 19/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 20/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 21/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 22/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 23/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 24/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 25/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 26/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 27/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 28/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 29/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 30/32 : 0
amax:102429:103953 [0] NCCL INFO Channel 31/32 : 0
amax:102429:103953 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1
amax:102429:103953 [0] NCCL INFO P2P Chunksize set to 131072
amax:102429:103953 [0] NCCL INFO Connected all rings
amax:102429:103953 [0] NCCL INFO Connected all trees
amax:102429:103953 [0] NCCL INFO 32 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
amax:102429:103953 [0] NCCL INFO comm 0xd74cb40 rank 0 nranks 1 cudaDev 0 nvmlDev 0 busId 4000 commId 0x729870e4af3743cd - Init COMPLETE
transformer.layers.0.attention.query_key_value.matrix_A.0
transformer.layers.0.attention.query_key_value.matrix_A.1
transformer.layers.0.attention.query_key_value.matrix_A.2
transformer.layers.0.attention.query_key_value.matrix_B.0
transformer.layers.0.attention.query_key_value.matrix_B.1
transformer.layers.0.attention.query_key_value.matrix_B.2
transformer.layers.0.attention.dense.matrix_A.0
transformer.layers.0.attention.dense.matrix_B.0
transformer.layers.14.attention.query_key_value.matrix_A.0
transformer.layers.14.attention.query_key_value.matrix_A.1
transformer.layers.14.attention.query_key_value.matrix_A.2
transformer.layers.14.attention.query_key_value.matrix_B.0
transformer.layers.14.attention.query_key_value.matrix_B.1
transformer.layers.14.attention.query_key_value.matrix_B.2
transformer.layers.14.attention.dense.matrix_A.0
transformer.layers.14.attention.dense.matrix_B.0
[2024-08-23 13:44:20,101] [INFO] [RANK 0] [<class 'sat.ops.layernorm.LayerNorm'>, <class 'torch.nn.modules.normalization.LayerNorm'>, <class 'sat.ops.layernorm.RMSNorm'>] is set to no_weight_decay
[2024-08-23 13:44:20,103] [INFO] [RANK 0] Syncing initialized parameters...
Traceback (most recent call last):
File "finetune_visualglm.py", line 194, in
training_main(args, model_cls=model, forward_step_function=forward_step, create_dataset_function=create_dataset_function, collate_fn=data_collator)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/training/deepspeed_training.py", line 116, in training_main
model, optimizer = setup_model_untrainable_params_and_optimizer(args, model)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/training/deepspeed_training.py", line 196, in setup_model_untrainable_params_and_optimizer
dist.broadcast(
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/distributed/c10d_logger.py", line 47, in wrapper
return func(*args, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1906, in broadcast
work = default_pg.broadcast([tensor], opts)
RuntimeError: No backend type associated with device type cpu -------------------》问题在这里
[2024-08-23 13:44:22,594] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 102429

@1049451037
Copy link
Member

我感觉是你的机器没有显卡:

https://github.com/THUDM/VisualGLM-6B/blob/c468ec2e56e02564fcd46f507b32d522d72b8210/finetune_visualglm.py#L179-L180

可以看到这里的代码,如果cuda available,模型就会在cuda上,而不是在cpu上。你可以在这行代码if里加一个断点,确认是否运行了.cuda()。

@corkiyao
Copy link

我感觉是你的机器没有显卡:

https://github.com/THUDM/VisualGLM-6B/blob/c468ec2e56e02564fcd46f507b32d522d72b8210/finetune_visualglm.py#L179-L180

可以看到这里的代码,如果cuda available,模型就会在cuda上,而不是在cpu上。你可以在这行代码if里加一个断点,确认是否运行了.cuda()。

是有的,有单机8张卡。只想用一张。在第一行加载权重之后,我在if torch.cuda.is_available() 里面加了print("111111111111111111111111111111")。结果打印了出来。而且我昨天还用了GPU跑其他程序,都可以正常调用。我使用few-shot的官方例子数据进行微调,但是目前是会发生这个问题。

[2024-08-23 14:46:54,366] [INFO] [RANK 0] > successfully loaded satckpt/1/mp_rank_00_model_states.pt
111111111111111111111111111111
[2024-08-23 14:47:07,513] [INFO] [RANK 0] Try to load tokenizer from Huggingface transformers...
[2024-08-23 14:47:12,541] [INFO] [RANK 0] > Set tokenizer as a /home/data/yaoyunze/visualglm2/VisualGLM-6B/chatckpt tokenizer! Now you can get_tokenizer() everywhere.
amax:144840:144840 [0] NCCL INFO Bootstrap : Using enp129s0f0:192.168.1.25<0>
amax:144840:144840 [0] NCCL INFO NET/Plugin : Plugin load (libnccl-net.so) returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory
amax:144840:144840 [0] NCCL INFO NET/Plugin : No plugin found, using internal implementation
amax:144840:144840 [0] NCCL INFO cudaDriverVersion 12020
NCCL version 2.18.5+cuda11.8
amax:144840:179898 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
amax:144840:179898 [0] NCCL INFO Failed to open libibverbs.so[.1]
amax:144840:179898 [0] NCCL INFO NET/Socket : Using [0]enp129s0f0:192.168.1.25<0>
amax:144840:179898 [0] NCCL INFO Using network Socket
amax:144840:179898 [0] NCCL INFO comm 0x90c7410 rank 0 nranks 1 cudaDev 0 nvmlDev 1 busId 5000 commId 0x2b39f7ea9901b6b4 - Init START
amax:144840:179898 [0] NCCL INFO Setting affinity for GPU 1 to 0f,ff000fff
amax:144840:179898 [0] NCCL INFO Channel 00/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 01/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 02/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 03/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 04/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 05/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 06/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 07/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 08/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 09/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 10/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 11/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 12/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 13/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 14/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 15/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 16/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 17/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 18/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 19/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 20/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 21/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 22/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 23/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 24/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 25/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 26/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 27/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 28/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 29/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 30/32 : 0
amax:144840:179898 [0] NCCL INFO Channel 31/32 : 0
amax:144840:179898 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1
amax:144840:179898 [0] NCCL INFO P2P Chunksize set to 131072
amax:144840:179898 [0] NCCL INFO Connected all rings
amax:144840:179898 [0] NCCL INFO Connected all trees
amax:144840:179898 [0] NCCL INFO 32 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
amax:144840:179898 [0] NCCL INFO comm 0x90c7410 rank 0 nranks 1 cudaDev 0 nvmlDev 1 busId 5000 commId 0x2b39f7ea9901b6b4 - Init COMPLETE
transformer.layers.0.attention.query_key_value.matrix_A.0
transformer.layers.0.attention.query_key_value.matrix_A.1
transformer.layers.0.attention.query_key_value.matrix_A.2
transformer.layers.0.attention.query_key_value.matrix_B.0
transformer.layers.0.attention.query_key_value.matrix_B.1
transformer.layers.0.attention.query_key_value.matrix_B.2
transformer.layers.0.attention.dense.matrix_A.0
transformer.layers.0.attention.dense.matrix_B.0
transformer.layers.14.attention.query_key_value.matrix_A.0
transformer.layers.14.attention.query_key_value.matrix_A.1
transformer.layers.14.attention.query_key_value.matrix_A.2
transformer.layers.14.attention.query_key_value.matrix_B.0
transformer.layers.14.attention.query_key_value.matrix_B.1
transformer.layers.14.attention.query_key_value.matrix_B.2
transformer.layers.14.attention.dense.matrix_A.0
transformer.layers.14.attention.dense.matrix_B.0
[2024-08-23 14:47:35,179] [INFO] [RANK 0] [<class 'sat.ops.layernorm.LayerNorm'>, <class 'torch.nn.modules.normalization.LayerNorm'>, <class 'sat.ops.layernorm.RMSNorm'>] is set to no_weight_decay
[2024-08-23 14:47:35,191] [INFO] [RANK 0] Syncing initialized parameters...
Traceback (most recent call last):
File "finetune_visualglm.py", line 196, in
training_main(args, model_cls=model, forward_step_function=forward_step, create_dataset_function=create_dataset_function, collate_fn=data_collator)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/training/deepspeed_training.py", line 116, in training_main
model, optimizer = setup_model_untrainable_params_and_optimizer(args, model)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/training/deepspeed_training.py", line 196, in setup_model_untrainable_params_and_optimizer
dist.broadcast(
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/distributed/c10d_logger.py", line 47, in wrapper
return func(*args, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1906, in broadcast
work = default_pg.broadcast([tensor], opts)

RuntimeError: No backend type associated with device type cpu ------>相同问题-------------------------------------------

[2024-08-23 14:47:47,702] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 144840

[2024-08-23 14:47:47,975] [ERROR] [launch.py:325:sigkill_handler] ['/home/yaoyunze/anaconda3/envs/visualglm/bin/python', '-u', 'finetune_visualglm.py', '--local_rank=0', '--experiment-name', 'finetune-visualglm-6b', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '300', '--resume-dataloader', '--max_source_length', '64', '--max_target_length', '256', '--lora_rank', '10', '--layer_range', '0', '14', '--pre_seq_len', '4', '--train-data', './fewshot-data/dataset.json', '--valid-data', './fewshot-data/dataset.json', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--save-interval', '300', '--eval-interval', '10000', '--save', './checkpoints', '--split', '1', '--eval-iters', '10', '--eval-batch-size', '1', '--zero-stage', '1', '--lr', '0.0001', '--batch-size', '1', '--gradient-accumulation-steps', '4', '--skip-init', '--fp16', '--use_qlora'] exits with return code = 1

代码显示的问题在这里。
if not check_if_zero3(args):
print_rank0('Syncing initialized parameters...')
for param_group in param_groups:
for param in param_group['params']:
if not param.model_parallel:
# We already keep the same random seed for different ranks
# However, it is not reliable. Non-model-parallel parameters could be different when initialization.
dist.broadcast(param.data, ------------------------------->这里不知道为啥会报错
src=0, # group is default group
)

@1049451037
Copy link
Member

@corkiyao
Copy link

@corkiyao
Copy link

修复了,pull一下试试:

https://github.com/THUDM/VisualGLM-6B/blob/e314fb9c4e778851414f39784317c72765acec47/finetune_visualglm.py#L181

好了,能训练起来了,显存只用到9GB,非常感谢。

@corkiyao
Copy link

修复了,pull一下试试:

https://github.com/THUDM/VisualGLM-6B/blob/e314fb9c4e778851414f39784317c72765acec47/finetune_visualglm.py#L181

好吧,qlora微调完之后,推理阶段出现维度不匹配的问题。看过之前的issue,但是也没有说出现这种情况.......

[2024-08-23 17:10:35,890] [INFO] [RANK 0] replacing layer 0 attention with lora
[2024-08-23 17:10:36,839] [INFO] [RANK 0] replacing layer 14 attention with lora
[2024-08-23 17:10:37,829] [INFO] [RANK 0] replacing chatglm linear layer with 4bit
[2024-08-23 17:11:50,826] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7802848768
[2024-08-23 17:12:44,922] [INFO] [RANK 0] global rank 0 is loading checkpoint /home/data/yaoyunze/visualglm4/VisualGLM-6B-main/checkpoints/finetune-visualglm-6b-08-23-16-41/300/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "cli_demo.py", line 103, in
main()
File "cli_demo.py", line 30, in main
model, model_args = AutoModel.from_pretrained(
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 342, in from_pretrained
return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/base_model.py", line 336, in from_pretrained_base
load_checkpoint(model, args, load_path=model_path, prefix=prefix)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/training/model_io.py", line 304, in load_checkpoint
missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2138, in load_state_dict
load(self, state_dict)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2126, in load
load(child, child_state_dict, child_prefix)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2126, in load
load(child, child_state_dict, child_prefix)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2126, in load
load(child, child_state_dict, child_prefix)
[Previous line repeated 3 more times]
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2120, in load
module._load_from_state_dict(
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/finetune/lora2.py", line 49, in load_from_state_dict
self.weight.data.copy
(state_dict[prefix+'weight'])
RuntimeError: The size of tensor a (12288) must match the size of tensor b (25165824) at non-singleton dimension 0

@1049451037
Copy link
Member

试一下改成这样:

    model, model_args = AutoModel.from_pretrained(
        args.from_pretrained,
        args=argparse.Namespace(
        fp16=True,
        skip_init=True,
        use_gpu_initialization=True if (torch.cuda.is_available() and args.quant is None) else False,
        device='cuda' if (torch.cuda.is_available() and args.quant is None) else 'cpu',
    ), build_only=True)
    from sat.training.model_io import load_checkpoint
    load_checkpoint(model, model_args, args.from_pretrained)

@corkiyao
Copy link

试一下改成这样:

    model, model_args = AutoModel.from_pretrained(
        args.from_pretrained,
        args=argparse.Namespace(
        fp16=True,
        skip_init=True,
        use_gpu_initialization=True if (torch.cuda.is_available() and args.quant is None) else False,
        device='cuda' if (torch.cuda.is_available() and args.quant is None) else 'cpu',
    ), build_only=True)
    from sat.training.model_io import load_checkpoint
    load_checkpoint(model, model_args, args.from_pretrained)

`def main():
parser = argparse.ArgumentParser()
parser.add_argument("--max_length", type=int, default=2048, help='max length of the total sequence')
parser.add_argument("--top_p", type=float, default=0.4, help='top p for nucleus sampling')
parser.add_argument("--top_k", type=int, default=100, help='top k for top k sampling')
parser.add_argument("--temperature", type=float, default=.8, help='temperature for sampling')
parser.add_argument("--english", action='store_true', help='only output English')
parser.add_argument("--quant", choices=[8, 4], type=int, default=4, help='quantization bits')
parser.add_argument("--from_pretrained", type=str, default="visualglm-6b", help='pretrained ckpt')
parser.add_argument("--prompt_zh", type=str, default="描述这张图片。", help='Chinese prompt for the first round')
parser.add_argument("--prompt_en", type=str, default="Describe the image.", help='English prompt for the first round')
args = parser.parse_args()

# load model
# model, model_args = AutoModel.from_pretrained(
#     args.from_pretrained,
#     args=argparse.Namespace(
#     fp16=True,
#     skip_init=True,
#     use_gpu_initialization=True if (torch.cuda.is_available() and args.quant is None) else False,
#     device='cuda' if (torch.cuda.is_available() and args.quant is None) else 'cpu',
# ))

model, model_args = AutoModel.from_pretrained(
    args.from_pretrained,
    args=argparse.Namespace(
    fp16=True,
    skip_init=True,
    use_gpu_initialization=True if (torch.cuda.is_available() and args.quant is None) else False,
    device='cuda' if (torch.cuda.is_available() and args.quant is None) else 'cpu',
), build_only=True)
from sat.training.model_io import load_checkpoint
load_checkpoint(model, model_args, args.from_pretrained)

model = model.eval()

if args.quant:
    quantize(model, args.quant)
    if torch.cuda.is_available():
        model = model.cuda()
        args.device = 'cuda'

model.add_mixin('auto-regressive', CachedAutoregressiveMixin())

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)`

这样吗?

@1049451037
Copy link
Member

是的

@corkiyao
Copy link

不太行,还是一样的问题。
[2024-08-23 17:52:48,818] [INFO] [RANK 0] global rank 0 is loading checkpoint /home/data/yaoyunze/visualglm4/VisualGLM-6B-main/checkpoints/finetune-visualglm-6b-08-23-16-41/300/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "cli_demo.py", line 116, in
main()
File "cli_demo.py", line 48, in main
load_checkpoint(model, model_args, args.from_pretrained)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/training/model_io.py", line 304, in load_checkpoint
missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2138, in load_state_dict
load(self, state_dict)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2126, in load
load(child, child_state_dict, child_prefix)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2126, in load
load(child, child_state_dict, child_prefix)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2126, in load
load(child, child_state_dict, child_prefix)
[Previous line repeated 3 more times]
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2120, in load
module._load_from_state_dict(
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/finetune/lora2.py", line 49, in load_from_state_dict
self.weight.data.copy
(state_dict[prefix+'weight'])
RuntimeError: The size of tensor a (12288) must match the size of tensor b (25165824) at non-singleton dimension 0

@corkiyao
Copy link

是的

看到之前有人提问,我试一试这个库。我现在的bitsandbytes是0.43.3版本

@1049451037
Copy link
Member

这样:

    model, model_args = AutoModel.from_pretrained(
        args.from_pretrained,
        args=argparse.Namespace(
        fp16=True,
        skip_init=True,
        use_gpu_initialization=False,
        device='cpu',
    ), build_only=True)
    model = model.cuda()
    from sat.training.model_io import load_checkpoint
    load_checkpoint(model, model_args, args.from_pretrained)

@corkiyao
Copy link

这样:

    model, model_args = AutoModel.from_pretrained(
        args.from_pretrained,
        args=argparse.Namespace(
        fp16=True,
        skip_init=True,
        use_gpu_initialization=False,
        device='cpu',
    ), build_only=True)
    model = model.cuda()
    from sat.training.model_io import load_checkpoint
    load_checkpoint(model, model_args, args.from_pretrained)

佬,晚上八点前还在吗?这个问题我现在还在试着,但是加载较慢,估计时间很久。但是我怕解决不了

@1049451037
Copy link
Member

1049451037 commented Aug 23, 2024

这里用着比较复杂主要是因为bitsandbytes把做量化这件事放到了.to('cuda')函数里。训练的时候加载的是没有量化的权重,因此需要构造模型->加载权重->.to('cuda'),但是训练存下来的是量化以后的权重,所以inference需要构造模型->.to('cuda')->加载权重。

@corkiyao
Copy link

这里用着比较复杂主要是因为bitsandbytes把做量化这件事放到了.to('cuda')函数里。训练的时候加载的是没有量化的权重,因此需要构造模型->加载权重->.to('cuda'),但是训练存下来的是量化以后的权重,所以inference需要构造模型->.to('cuda')->加载权重。

嗯嗯,我再试试。

@corkiyao
Copy link

这里用着比较复杂主要是因为bitsandbytes把做量化这件事放到了.to('cuda')函数里。训练的时候加载的是没有量化的权重,因此需要构造模型->加载权重->.to('cuda'),但是训练存下来的是量化以后的权重,所以inference需要构造模型->.to('cuda')->加载权重。

又出错了..........是bitsandbytes版本问题?
home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/init.py:412: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
[2024-08-23 18:06:12,797] [INFO] [RANK 0] replacing layer 0 attention with lora
[2024-08-23 18:06:13,492] [INFO] [RANK 0] replacing layer 14 attention with lora
[2024-08-23 18:06:14,199] [INFO] [RANK 0] replacing chatglm linear layer with 4bit
[2024-08-23 18:07:15,413] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7802848768
[2024-08-23 18:07:29,400] [INFO] [RANK 0] global rank 0 is loading checkpoint /home/data/yaoyunze/visualglm4/VisualGLM-6B-main/checkpoints/finetune-visualglm-6b-08-23-16-41/300/mp_rank_00_model_states.pt
Traceback (most recent call last):
File "cli_demo.py", line 130, in
if name == "main":
File "cli_demo.py", line 60, in main
from sat.training.model_io import load_checkpoint
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/training/model_io.py", line 304, in load_checkpoint
missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2138, in load_state_dict
load(self, state_dict)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2126, in load
load(child, child_state_dict, child_prefix)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2126, in load
load(child, child_state_dict, child_prefix)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2126, in load
load(child, child_state_dict, child_prefix)
[Previous line repeated 3 more times]
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2120, in load
module._load_from_state_dict(
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/finetune/lora2.py", line 51, in _load_from_state_dict
copy_nested_list(state_dict[prefix+'quant_state'], self.weight.quant_state)
File "/home/yaoyunze/anaconda3/envs/visualglm/lib/python3.8/site-packages/sat/model/finetune/lora2.py", line 39, in copy_nested_list
for i in range(len(dst)):
TypeError: object of type 'QuantState' has no len()

@1049451037
Copy link
Member

1049451037 commented Aug 23, 2024

修好了,重新pull一下最新的sat。然后cli_demo用这个:

    # load model
    model, model_args = AutoModel.from_pretrained(
        args.from_pretrained,
        args=argparse.Namespace(
        fp16=True,
        skip_init=True,
        use_gpu_initialization=False,
        device='cpu',
    ), build_only=True)
    model = model.cuda()
    from sat.training.model_io import load_checkpoint
    load_checkpoint(model, model_args, args.from_pretrained)
    model = model.eval()

但是需要重新训练,因为save模型的逻辑也改了。

@corkiyao
Copy link

修好了,重新pull一下最新的sat。然后cli_demo用这个:

    # load model
    model, model_args = AutoModel.from_pretrained(
        args.from_pretrained,
        args=argparse.Namespace(
        fp16=True,
        skip_init=True,
        use_gpu_initialization=False,
        device='cpu',
    ), build_only=True)
    model = model.cuda()
    from sat.training.model_io import load_checkpoint
    load_checkpoint(model, model_args, args.from_pretrained)
    model = model.eval()

但是需要重新训练,因为save模型的逻辑也改了。

好滴,我重新训练下

@corkiyao
Copy link

corkiyao commented Aug 24, 2024

修好了,重新pull一下最新的sat。然后cli_demo用这个:

    # load model
    model, model_args = AutoModel.from_pretrained(
        args.from_pretrained,
        args=argparse.Namespace(
        fp16=True,
        skip_init=True,
        use_gpu_initialization=False,
        device='cpu',
    ), build_only=True)
    model = model.cuda()
    from sat.training.model_io import load_checkpoint
    load_checkpoint(model, model_args, args.from_pretrained)
    model = model.eval()

但是需要重新训练,因为save模型的逻辑也改了。

不知道是不是训练的问题,预测的结果似乎不对劲。这是我的cli_demo.py

model, model_args = AutoModel.from_pretrained(
        args.from_pretrained,
        args=argparse.Namespace(
        fp16=True,
        skip_init=True,
        use_gpu_initialization=False,
        device='cpu',
    ), build_only=True)
    model = model.cuda()
    from sat.training.model_io import load_checkpoint
    load_checkpoint(model, model_args, args.from_pretrained)
    model = model.eval()

    model.add_mixin('auto-regressive', CachedAutoregressiveMixin())

    tokenizer = AutoTokenizer.from_pretrained("/home/data/yaoyunze/visualglm2/VisualGLM-6B/chatckpt", trust_remote_code=True) #本地路径
    if not args.english:
        print('欢迎使用 VisualGLM-6B 模型,输入图像URL或本地路径读图,继续输入内容对话,clear 重新开始,stop 终止程序')
    else:
        print('Welcome to VisualGLM-6B model. Enter an image URL or local file path to load an image. Continue inputting text to engage in a conversation. Type "clear" to start over, or "stop" to end the program.')
    with torch.no_grad():
        while True:
            history = None
            cache_image = None
            if not args.english:
                image_path = input("请输入图像路径或URL(回车进入纯文本对话): ")
            else:
                image_path = input("Please enter the image path or URL (press Enter for plain text conversation): ")

            if image_path == 'stop':
                break
            if len(image_path) > 0:
                query = args.prompt_en if args.english else args.prompt_zh
            else:
                if not args.english:
                    query = input("用户:")
                else:
                    query = input("User: ")
            while True:
                if query == "clear":
                    break
                if query == "stop":
                    sys.exit(0)
                try:
                    response, history, cache_image = chat(
                        image_path, 
                        model, 
                        tokenizer,
                        query, 
                        history=history, 
                        image=cache_image, 
                        max_length=args.max_length, 
                        top_p=args.top_p, 
                        temperature=args.temperature,
                        top_k=args.top_k,
                        english=args.english,
                        invalid_slices=[slice(63823, 130000)] if args.english else []
                        )
                except Exception as e:
                    print(e)
                    break
                sep = 'A:' if args.english else '答:'
                print("VisualGLM-6B:"+response.split(sep)[-1].strip())
                image_path = None
                if not args.english:
                    query = input("用户:")
                else:
                    query = input("User: ")

结果:

请输入图像路径或URL回车进入纯文本对话): fewshot-data/2p.png
VisualGLM-6B男女走在一起相互依靠.
用户clear
请输入图像路径或URL回车进入纯文本对话): fewshot-data/2p.png
VisualGLM-6B男女走在下雨的街道上,(label 这张图片的背景是蒙蒙细雨。)
用户clear
请输入图像路径或URL回车进入纯文本对话): fewshot-data/ghost.jpg
VisualGLM-6B这张图片的背景是一张桌子桌子上有棋盘。(label 这张图片的背景是一个房间用户

@corkiyao
Copy link

修好了,重新pull一下最新的sat。然后cli_demo用这个:

    # load model
    model, model_args = AutoModel.from_pretrained(
        args.from_pretrained,
        args=argparse.Namespace(
        fp16=True,
        skip_init=True,
        use_gpu_initialization=False,
        device='cpu',
    ), build_only=True)
    model = model.cuda()
    from sat.training.model_io import load_checkpoint
    load_checkpoint(model, model_args, args.from_pretrained)
    model = model.eval()

但是需要重新训练,因为save模型的逻辑也改了。

不知道是不是训练的问题,预测的结果似乎不对劲。这是我的cli_demo.py

model, model_args = AutoModel.from_pretrained(
        args.from_pretrained,
        args=argparse.Namespace(
        fp16=True,
        skip_init=True,
        use_gpu_initialization=False,
        device='cpu',
    ), build_only=True)
    model = model.cuda()
    from sat.training.model_io import load_checkpoint
    load_checkpoint(model, model_args, args.from_pretrained)
    model = model.eval()

    model.add_mixin('auto-regressive', CachedAutoregressiveMixin())

    tokenizer = AutoTokenizer.from_pretrained("/home/data/yaoyunze/visualglm2/VisualGLM-6B/chatckpt", trust_remote_code=True) #本地路径
    if not args.english:
        print('欢迎使用 VisualGLM-6B 模型,输入图像URL或本地路径读图,继续输入内容对话,clear 重新开始,stop 终止程序')
    else:
        print('Welcome to VisualGLM-6B model. Enter an image URL or local file path to load an image. Continue inputting text to engage in a conversation. Type "clear" to start over, or "stop" to end the program.')
    with torch.no_grad():
        while True:
            history = None
            cache_image = None
            if not args.english:
                image_path = input("请输入图像路径或URL(回车进入纯文本对话): ")
            else:
                image_path = input("Please enter the image path or URL (press Enter for plain text conversation): ")

            if image_path == 'stop':
                break
            if len(image_path) > 0:
                query = args.prompt_en if args.english else args.prompt_zh
            else:
                if not args.english:
                    query = input("用户:")
                else:
                    query = input("User: ")
            while True:
                if query == "clear":
                    break
                if query == "stop":
                    sys.exit(0)
                try:
                    response, history, cache_image = chat(
                        image_path, 
                        model, 
                        tokenizer,
                        query, 
                        history=history, 
                        image=cache_image, 
                        max_length=args.max_length, 
                        top_p=args.top_p, 
                        temperature=args.temperature,
                        top_k=args.top_k,
                        english=args.english,
                        invalid_slices=[slice(63823, 130000)] if args.english else []
                        )
                except Exception as e:
                    print(e)
                    break
                sep = 'A:' if args.english else '答:'
                print("VisualGLM-6B:"+response.split(sep)[-1].strip())
                image_path = None
                if not args.english:
                    query = input("用户:")
                else:
                    query = input("User: ")

结果:

请输入图像路径或URL回车进入纯文本对话): fewshot-data/2p.png
VisualGLM-6B男女走在一起相互依靠.
用户clear
请输入图像路径或URL回车进入纯文本对话): fewshot-data/2p.png
VisualGLM-6B男女走在下雨的街道上,(label 这张图片的背景是蒙蒙细雨。)
用户clear
请输入图像路径或URL回车进入纯文本对话): fewshot-data/ghost.jpg
VisualGLM-6B这张图片的背景是一张桌子桌子上有棋盘。(label 这张图片的背景是一个房间用户

prompt提示是:--prompt_zh 这张图片的背景里有什么内容?

@yuchao4x
Copy link

大佬,打扰了,这个问题在我的环境下也复现了,我sat和virtualglm用的都是最新的代码,报错如下,可以帮忙看下嘛。
而且不管是跑XrayGLM或者VisualGLM-6B,都是这个报错。我贴在这里
root@dell-PowerEdge-R750:/data/yuchao/VisualGLM-6B-main# python test.py
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:311: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.8/lib64/libcudart.so
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so...
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:311: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
[2024-12-11 17:16:42,693] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-12-11 17:16:44,633] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-12-11 17:16:44,634] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2024-12-11 17:16:44,634] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.
[rank0]: Traceback (most recent call last):
[rank0]: File "/data/yuchao/VisualGLM-6B-main/test.py", line 12, in
[rank0]: model = AutoModel.from_pretrained(modelpath, trust_remote_code=True).half().cuda()
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
[rank0]: return model_class.from_pretrained(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2966, in from_pretrained
[rank0]: model = cls(config, *model_args, **model_kwargs)
[rank0]: File "/root/.cache/huggingface/modules/transformers_modules/visualglm-6b/modeling_chatglm.py", line 1345, in init
[rank0]: self.image_encoder = BLIP2(config.eva_config, config.qformer_config)
[rank0]: File "/root/.cache/huggingface/modules/transformers_modules/visualglm-6b/visual.py", line 59, in init
[rank0]: self.vit = EVAViT(EVAViT.get_args(**eva_args))
[rank0]: File "/root/.cache/huggingface/modules/transformers_modules/visualglm-6b/visual.py", line 20, in init
[rank0]: super().init(args, transformer=transformer, parallel_output=parallel_output, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/SwissArmyTransformer-0.4.12-py3.10.egg/sat/model/official/vit_model.py", line 111, in init
[rank0]: super().init(args, transformer=transformer, **kwargs)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/SwissArmyTransformer-0.4.12-py3.10.egg/sat/model/base_model.py", line 93, in init
[rank0]: self.transformer = BaseTransformer(
[rank0]: TypeError: sat.model.transformer.BaseTransformer() got multiple values for keyword argument 'parallel_output'
[rank0]:[W1211 17:16:45.054316028 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())

@yuchao4x
Copy link

大佬,打扰了,这个问题在我的环境下也复现了,我sat和virtualglm用的都是最新的代码,报错如下,可以帮忙看下嘛。 而且不管是跑XrayGLM或者VisualGLM-6B,都是这个报错。我贴在这里 root@dell-PowerEdge-R750:/data/yuchao/VisualGLM-6B-main# python test.py /usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:311: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node(

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so /usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32 CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine! CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.8/lib64/libcudart.so /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library... warn(msg) CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cpu.so... /usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:311: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( [2024-12-11 17:16:42,693] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-12-11 17:16:44,633] [INFO] [RANK 0] > initializing model parallel with size 1 [2024-12-11 17:16:44,634] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually. [2024-12-11 17:16:44,634] [INFO] [RANK 0] You are using model-only mode. For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK. [rank0]: Traceback (most recent call last): [rank0]: File "/data/yuchao/VisualGLM-6B-main/test.py", line 12, in [rank0]: model = AutoModel.from_pretrained(modelpath, trust_remote_code=True).half().cuda() [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained [rank0]: return model_class.from_pretrained( [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2966, in from_pretrained [rank0]: model = cls(config, *model_args, **model_kwargs) [rank0]: File "/root/.cache/huggingface/modules/transformers_modules/visualglm-6b/modeling_chatglm.py", line 1345, in init [rank0]: self.image_encoder = BLIP2(config.eva_config, config.qformer_config) [rank0]: File "/root/.cache/huggingface/modules/transformers_modules/visualglm-6b/visual.py", line 59, in init [rank0]: self.vit = EVAViT(EVAViT.get_args(**eva_args)) [rank0]: File "/root/.cache/huggingface/modules/transformers_modules/visualglm-6b/visual.py", line 20, in init [rank0]: super().init(args, transformer=transformer, parallel_output=parallel_output, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/SwissArmyTransformer-0.4.12-py3.10.egg/sat/model/official/vit_model.py", line 111, in init [rank0]: super().init(args, transformer=transformer, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/SwissArmyTransformer-0.4.12-py3.10.egg/sat/model/base_model.py", line 93, in init [rank0]: self.transformer = BaseTransformer( [rank0]: TypeError: sat.model.transformer.BaseTransformer() got multiple values for keyword argument 'parallel_output' [rank0]:[W1211 17:16:45.054316028 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())

solved ,把sat版本降到3.9就可以了pip install SwissArmyTransformer==0.3.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants