multi-training #11

Hu-Yuch · 2023-07-27T08:28:55Z

Why using your code multi graphics card training is as fast as single-card training (2x3090)？

Lakonik · 2023-07-28T07:22:49Z

Hi, i'm not sure what your question is. Isn't it natural that multi-GPU DDP training costs around the same time per iteration as single GPU training?

Hu-Yuch · 2023-07-28T07:47:41Z

n

Yes,The eta of multi-GPU is the same as single gpu for 17 days

Lakonik · 2023-07-28T08:40:40Z

In the config we set the total number of training iterations, so changing the number of GPUs will only affect the total number of epochs but not the training time.

And btw, the initial eta is an unreliable overestimate.

Hu-Yuch · 2023-07-28T10:00:50Z

I see.that means if i change the total_iters to be half in multi-GPU training，we can get the same result as single-GPU training？I remember only 6 days for multi-GPU training in your paper

Lakonik · 2023-07-28T10:03:16Z

There's no need to change the schedule, the 17 days eta is simply wrong.

Hu-Yuch · 2023-07-28T10:27:52Z

okay，I see.

Hu-Yuch · 2023-08-07T14:25:20Z

If i use 4-gpus，can i change the total_iters to be half（500k） to get the simlar result as 1000k with 2-gpus？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-training #11

multi-training #11

Hu-Yuch commented Jul 27, 2023

Lakonik commented Jul 28, 2023

Hu-Yuch commented Jul 28, 2023

Lakonik commented Jul 28, 2023

Hu-Yuch commented Jul 28, 2023

Lakonik commented Jul 28, 2023

Hu-Yuch commented Jul 28, 2023

Hu-Yuch commented Aug 7, 2023

multi-training #11

multi-training #11

Comments

Hu-Yuch commented Jul 27, 2023

Lakonik commented Jul 28, 2023

Hu-Yuch commented Jul 28, 2023

Lakonik commented Jul 28, 2023

Hu-Yuch commented Jul 28, 2023

Lakonik commented Jul 28, 2023

Hu-Yuch commented Jul 28, 2023

Hu-Yuch commented Aug 7, 2023