-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to run training/fsdp-qlora-distributed-llama3.ipynb #55
Comments
You use the same versions? Same code? Or did you change something? |
Okay, I had to install the latest flash-attn library to get rid of this error first (when I just ran the notebook as it is): # Install Pytorch for FSDP and FA/SDPA
%pip install --quiet "torch==2.2.2" tensorboard
# Install Hugging Face libraries
%pip install --upgrade "transformers==4.40.0" "datasets==2.18.0" "accelerate==0.29.3" "evaluate==0.4.1" "bitsandbytes==0.43.1" "huggingface_hub==0.22.2" "trl==0.8.6" "peft==0.10.0"
# I added
%pip install flash-attn --no-build-isolation
%pip install "torch==2.3.1" No change in code. Is there a specific flash-attn lib version, I should be using ? |
@philschmid No worries, able to make it work. Changed tf32's value from true to false. Did a quick test for max_steps=10. This is weird, usually combination of bf16: true and tf32: true works but here it didn't. Wonder why ? |
@philschmid I encountered the same issue. However, when I changed |
T4 GPUs are not supporting Bf16 of TF32 thats expected. |
@philschmid regarding the training using 4 15GB GPUs, what do you think? I am using a smaller model (8B) |
I have the same error on 4 H100 GPU. If I set up tf32 to false it does not solve anything. Same when doing tf16:true as in #55 (comment) |
Hi @philschmid ! Thank you for the blog. Its very helpful.
I am trying to reproduce the results as it is. Followed the blog, installed libraries with same versions.
Running into following issue:
ValueError: Must flatten tensors with uniform dtype but got torch.bfloat16 and torch.float32
Someone mentioned setting FSDP_CPU_RAM_EFFICIENT_LOADING=1 here should solve, but this is already set in the torchrun command as per blog.
Pretty much clueless. Any suggestions would be really helpful.
The text was updated successfully, but these errors were encountered: