Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature-request] Please improve multi-GPU multi-threading by separate training #84

Open
Technologov opened this issue Feb 7, 2019 · 4 comments

Comments

@Technologov
Copy link

Instead of using leela-client with lc0 GPU multiplexing, I rather recommend to rewrite a client in such a way as to detect # of GPUs, and run totally separate lc0 processes on each GPU.

Assume a server with 3 RTX GPUs, it could teach Leela faster than current lc0 multiplexing.

What do you think ?

-Technologov, 7.2.2019.

@madformuse
Copy link

I think we should run the client with BOINC :)

@mooskagh
Copy link
Member

Indeed being able to use multiple GPUs was just a side effect of multiplexing backend, it's main purpose is to combine requests from several games into one batch.

For machines with multiple GPUs, it's indeed probably better to run several copies of client, one per GPU.

Client cannot detect number of GPUs itself, and it's pretty complicated to have "one client multiple lc0" solution, so it doesn't sound feasible.

@kyleboddy
Copy link

When trying to run multiple instances of lc0-training-client on an RTX 3090 and a GTX 1070ti (I know, large disparity) using separate Powershell windows, I get CUDA out of memory errors and other issues as a result. Is multi-GPU training dead?

@mooskagh
Copy link
Member

Many people run multi-GPU training game generation with no issues, so it's certainly not dead.

It's kind of unexpected that CUDA runs out of memory due to this though, I believe what happens is GTX 1070ti is running out of memory, and that's not connected with having another training running in parallel.

I believe it's possible to pass parameters so that it requires less VRAM, I suggest to join our Discord chat at lc0.org/chat and ask there in #help channel. That way it should be faster than here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants