You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm interested in running the example found in biencoder/nli_msmarco/scripts/train_bloom7b1.slurm. Is it possible to execute these using accelerate and deepspeed?
I'm planning to experiment with larger models, hence the need for deepspeed zero3.
The text was updated successfully, but these errors were encountered:
I've finished an initial version, but the results are not as promising as expected. The original combination of accelerate and gradcache could achieve nearly 250W GPU utilization, however, after switching to accelerate+deepspeed zero3 and disabling gradcache, the utilization drops to around 110W. I/O operations appear to be taking up most of the time, which might be due to the CPU offload(?).
Hi, I'm interested in running the example found in
biencoder/nli_msmarco/scripts/train_bloom7b1.slurm
. Is it possible to execute these using accelerate and deepspeed?I'm planning to experiment with larger models, hence the need for deepspeed zero3.
The text was updated successfully, but these errors were encountered: