Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-training #11

Open
Hu-Yuch opened this issue Jul 27, 2023 · 7 comments
Open

multi-training #11

Hu-Yuch opened this issue Jul 27, 2023 · 7 comments

Comments

@Hu-Yuch
Copy link

Hu-Yuch commented Jul 27, 2023

Why using your code multi graphics card training is as fast as single-card training (2x3090)?

@Lakonik
Copy link
Owner

Lakonik commented Jul 28, 2023

Hi, i'm not sure what your question is. Isn't it natural that multi-GPU DDP training costs around the same time per iteration as single GPU training?

@Hu-Yuch
Copy link
Author

Hu-Yuch commented Jul 28, 2023

n

Yes,The eta of multi-GPU is the same as single gpu for 17 days

@Lakonik
Copy link
Owner

Lakonik commented Jul 28, 2023

In the config we set the total number of training iterations, so changing the number of GPUs will only affect the total number of epochs but not the training time.

And btw, the initial eta is an unreliable overestimate.

@Hu-Yuch
Copy link
Author

Hu-Yuch commented Jul 28, 2023

I see.that means if i change the total_iters to be half in multi-GPU training,we can get the same result as single-GPU training?I remember only 6 days for multi-GPU training in your paper

@Lakonik
Copy link
Owner

Lakonik commented Jul 28, 2023

There's no need to change the schedule, the 17 days eta is simply wrong.

@Hu-Yuch
Copy link
Author

Hu-Yuch commented Jul 28, 2023

okay,I see.

@Hu-Yuch
Copy link
Author

Hu-Yuch commented Aug 7, 2023

If i use 4-gpus,can i change the total_iters to be half(500k) to get the simlar result as 1000k with 2-gpus?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants