You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your work and the GPU sharing solution you provide here, which is right now, for me, the most elegant and practical way to solve the GPU pooling challenge.
I've began to use it on a server with 4 gpus, in a setup in which only GPU 0, at this time, is shared via nvshare.
For this, I compiled and installed nvshare, following your recommendations in the README.
Some of our users (Team A) are automatically affected to GPU 0, with CUDA_VISIBLE_DEVICES=0 and LD_PRELOAD=:/usr/local/lib/libnvshare.so
And everything works fine, they don't have to change anything in their code, and accommodate without difficulty the, sometimes, longer computation delays.
The other users (not so privileged ones) can set manually CUDA_VISIBLE_DEVICES to 1, 2 or 3, with LD_PRELOAD unset, but need to communicate with each over in order to avoid collisions.
Now I would like to create a team B whose users would be affected to GPU 1, over nvshare. These users would stick to the GPU 1 and the nvshare algorithm would not interfere with the loads on shared GPU 0.
For this, what should be done ? Ideally, the TQ should also be configurable per GPU.
The text was updated successfully, but these errors were encountered:
Hi,
Thank you for your work and the GPU sharing solution you provide here, which is right now, for me, the most elegant and practical way to solve the GPU pooling challenge.
I've began to use it on a server with 4 gpus, in a setup in which only GPU 0, at this time, is shared via nvshare.
For this, I compiled and installed nvshare, following your recommendations in the README.
Some of our users (Team A) are automatically affected to GPU 0, with
CUDA_VISIBLE_DEVICES=0
andLD_PRELOAD=:/usr/local/lib/libnvshare.so
And everything works fine, they don't have to change anything in their code, and accommodate without difficulty the, sometimes, longer computation delays.
The other users (not so privileged ones) can set manually
CUDA_VISIBLE_DEVICES
to 1, 2 or 3, withLD_PRELOAD
unset, but need to communicate with each over in order to avoid collisions.Now I would like to create a team B whose users would be affected to GPU 1, over nvshare. These users would stick to the GPU 1 and the nvshare algorithm would not interfere with the loads on shared GPU 0.
For this, what should be done ? Ideally, the TQ should also be configurable per GPU.
The text was updated successfully, but these errors were encountered: