You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I only have one GPU, so I used the command python finetune.py --config-name=finetune.yaml during the first step of fine-tuning. However, I encountered the error RuntimeError: 'weight' must be 2-D. Below is the detailed error message, and I have failed to resolve the issue on my own. I hope to get some help. (Additionally, this is my first time asking a question related to research work, so if there's anything lacking, please let me know.)
[2024-09-12 23:49:01,206] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 291, num_elems = 6.74B
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.05s/it]generation_config.json: 100%|████████████████████████████████████████████████████████| 200/200 [00:00<00:00, 1.84MB/s]max_steps is given, it will override any value given in num_train_epochs
0%| | 0/1250 [00:00<?, ?it/s]Error executing job with overrides: []
Traceback (most recent call last):
File "/root/tofu-main/finetune.py", line 125, in main
trainer.train()
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train
return inner_training_loop(
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/transformers/trainer.py", line 2279, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/transformers/trainer.py", line 3318, in training_step
loss = self.compute_loss(model, inputs)
File "/root/tofu-main/dataloader.py", line 26, in compute_loss
outputs = model(input_ids,labels=labels, attention_mask=attention_mask)
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward
return model_forward(*args, **kwargs)
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
return func(*args, **kwargs)
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1189, in forward
outputs = self.model(
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 950, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 164, in forward
return F.embedding(
File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/functional.py", line 2267, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
The text was updated successfully, but these errors were encountered:
I guess this must be that deepspeed is doing lazy initialization (meaning they just initialize an empty tensor for the weights). What I would do is to get rid of deepspeed here https://github.com/locuslab/tofu/blob/main/finetune.py#L91 and just run with plain python. Let me know if this helps!
I only have one GPU, so I used the command python finetune.py --config-name=finetune.yaml during the first step of fine-tuning. However, I encountered the error RuntimeError: 'weight' must be 2-D. Below is the detailed error message, and I have failed to resolve the issue on my own. I hope to get some help. (Additionally, this is my first time asking a question related to research work, so if there's anything lacking, please let me know.)
The text was updated successfully, but these errors were encountered: