Question : usage / configuration per gpu #10

t-arsicaud-catie · 2023-09-21T07:35:34Z

Hi,

Thank you for your work and the GPU sharing solution you provide here, which is right now, for me, the most elegant and practical way to solve the GPU pooling challenge.

I've began to use it on a server with 4 gpus, in a setup in which only GPU 0, at this time, is shared via nvshare.

For this, I compiled and installed nvshare, following your recommendations in the README.

Some of our users (Team A) are automatically affected to GPU 0, with CUDA_VISIBLE_DEVICES=0 and LD_PRELOAD=:/usr/local/lib/libnvshare.so

And everything works fine, they don't have to change anything in their code, and accommodate without difficulty the, sometimes, longer computation delays.

The other users (not so privileged ones) can set manually CUDA_VISIBLE_DEVICES to 1, 2 or 3, with LD_PRELOAD unset, but need to communicate with each over in order to avoid collisions.

Now I would like to create a team B whose users would be affected to GPU 1, over nvshare. These users would stick to the GPU 1 and the nvshare algorithm would not interfere with the loads on shared GPU 0.

For this, what should be done ? Ideally, the TQ should also be configurable per GPU.

The text was updated successfully, but these errors were encountered:

grgalex · 2023-09-21T16:00:39Z

@t-arsicaud-catie

Do you (or someone from your team) want to take this on and prepare a PR?

Otherwise, I can do it in my spare time, on a best-effort basis, unless you have any other suggestion.

Feel free to drop me an e-mail if you want to discuss something in private.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question : usage / configuration per gpu #10

Question : usage / configuration per gpu #10

t-arsicaud-catie commented Sep 21, 2023

grgalex commented Sep 21, 2023

Question : usage / configuration per gpu #10

Question : usage / configuration per gpu #10

Comments

t-arsicaud-catie commented Sep 21, 2023

grgalex commented Sep 21, 2023