Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: 'weight' must be 2-D During Fine-Tuning with Single GPU #42

Open
ouerwt opened this issue Sep 12, 2024 · 1 comment
Open

Comments

@ouerwt
Copy link

ouerwt commented Sep 12, 2024

I only have one GPU, so I used the command python finetune.py --config-name=finetune.yaml during the first step of fine-tuning. However, I encountered the error RuntimeError: 'weight' must be 2-D. Below is the detailed error message, and I have failed to resolve the issue on my own. I hope to get some help. (Additionally, this is my first time asking a question related to research work, so if there's anything lacking, please let me know.)

[2024-09-12 23:49:01,206] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 291, num_elems = 6.74B
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.05s/it]generation_config.json: 100%|████████████████████████████████████████████████████████| 200/200 [00:00<00:00, 1.84MB/s]max_steps is given, it will override any value given in num_train_epochs
  0%|                                                                                        | 0/1250 [00:00<?, ?it/s]Error executing job with overrides: []
Traceback (most recent call last):
  File "/root/tofu-main/finetune.py", line 125, in main
    trainer.train()
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/transformers/trainer.py", line 2279, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/transformers/trainer.py", line 3318, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/tofu-main/dataloader.py", line 26, in compute_loss
    outputs = model(input_ids,labels=labels, attention_mask=attention_mask)
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward
    return model_forward(*args, **kwargs)
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1189, in forward
    outputs = self.model(
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 950, in forward
    inputs_embeds = self.embed_tokens(input_ids)
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 164, in forward
    return F.embedding(
  File "/root/miniconda3/envs/tofu/lib/python3.10/site-packages/torch/nn/functional.py", line 2267, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
@zhilif
Copy link
Collaborator

zhilif commented Dec 19, 2024

I guess this must be that deepspeed is doing lazy initialization (meaning they just initialize an empty tensor for the weights). What I would do is to get rid of deepspeed here https://github.com/locuslab/tofu/blob/main/finetune.py#L91 and just run with plain python. Let me know if this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants