accelerate + deepspeed? #30

alex-ht · 2023-05-16T08:49:02Z

Hi, I'm interested in running the example found in biencoder/nli_msmarco/scripts/train_bloom7b1.slurm. Is it possible to execute these using accelerate and deepspeed?
I'm planning to experiment with larger models, hence the need for deepspeed zero3.

The text was updated successfully, but these errors were encountered:

Muennighoff · 2023-05-16T08:56:24Z

Hey sounds exciting! I have only tried accelerate not accelerate + deepspeed, but I think it should work with deepspeed, too. Let me know how it goes!

alex-ht · 2023-05-19T01:11:27Z

I've finished an initial version, but the results are not as promising as expected. The original combination of accelerate and gradcache could achieve nearly 250W GPU utilization, however, after switching to accelerate+deepspeed zero3 and disabling gradcache, the utilization drops to around 110W. I/O operations appear to be taking up most of the time, which might be due to the CPU offload(?).

orignal:

deepspeed zero3:

here is the source code: https://github.com/alex-ht/sgpt/tree/deepspeed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accelerate + deepspeed? #30

accelerate + deepspeed? #30

alex-ht commented May 16, 2023

Muennighoff commented May 16, 2023

alex-ht commented May 19, 2023

accelerate + deepspeed? #30

accelerate + deepspeed? #30

Comments

alex-ht commented May 16, 2023

Muennighoff commented May 16, 2023

alex-ht commented May 19, 2023