[feature-request] Please improve multi-GPU multi-threading by separate training #84

Technologov · 2019-02-07T14:31:41Z

Instead of using leela-client with lc0 GPU multiplexing, I rather recommend to rewrite a client in such a way as to detect # of GPUs, and run totally separate lc0 processes on each GPU.

Assume a server with 3 RTX GPUs, it could teach Leela faster than current lc0 multiplexing.

What do you think ?

-Technologov, 7.2.2019.

madformuse · 2019-04-22T20:42:08Z

I think we should run the client with BOINC :)

mooskagh · 2019-04-29T19:59:40Z

Indeed being able to use multiple GPUs was just a side effect of multiplexing backend, it's main purpose is to combine requests from several games into one batch.

For machines with multiple GPUs, it's indeed probably better to run several copies of client, one per GPU.

Client cannot detect number of GPUs itself, and it's pretty complicated to have "one client multiple lc0" solution, so it doesn't sound feasible.

kyleboddy · 2022-06-27T21:41:41Z

When trying to run multiple instances of lc0-training-client on an RTX 3090 and a GTX 1070ti (I know, large disparity) using separate Powershell windows, I get CUDA out of memory errors and other issues as a result. Is multi-GPU training dead?

mooskagh · 2022-06-30T19:36:13Z

Many people run multi-GPU training game generation with no issues, so it's certainly not dead.

It's kind of unexpected that CUDA runs out of memory due to this though, I believe what happens is GTX 1070ti is running out of memory, and that's not connected with having another training running in parallel.

I believe it's possible to pass parameters so that it requires less VRAM, I suggest to join our Discord chat at lc0.org/chat and ask there in #help channel. That way it should be faster than here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature-request] Please improve multi-GPU multi-threading by separate training #84

[feature-request] Please improve multi-GPU multi-threading by separate training #84

Technologov commented Feb 7, 2019

madformuse commented Apr 22, 2019

mooskagh commented Apr 29, 2019

kyleboddy commented Jun 27, 2022

mooskagh commented Jun 30, 2022

[feature-request] Please improve multi-GPU multi-threading by separate training #84

[feature-request] Please improve multi-GPU multi-threading by separate training #84

Comments

Technologov commented Feb 7, 2019

madformuse commented Apr 22, 2019

mooskagh commented Apr 29, 2019

kyleboddy commented Jun 27, 2022

mooskagh commented Jun 30, 2022