-
Notifications
You must be signed in to change notification settings - Fork 967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] accelerate launcher: add numa affiinities control #2241
Comments
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
keepalive |
keepalive |
Will be looking into adding this week finally :) |
that's exciting, Zach - thank you! |
@stas00 actually wait, can't we just do this? accelerate launch --no_python --multi_gpu --num_processes 8 --monitor_interval=1 ./trampoline.sh python3 -c "print('hello')" (I just pulled all this from Or am I missing something?... Or, with a config file: accelerate launch --no_python --monitor_interval=1 ./trampoline.sh python3 -c "print('hello')" |
Note I'll also be making a PR to make you able to do both Edit; #2525 will make it possible to do this OOTB with no arg fixes, as the root cause was - vs _ |
if it works - then fantastic - let's document this then please and ideally move
|
Zach, I think the trouble with the workaround solution is that the user won't have I think a much better solution would be for the framework to have an independent solution that is provided by its core. |
Sure @stas00, I can agree on that and we can extend So we have the bash script: #!/usr/bin/bash
# Query the bus ID for device LOCAL_RANK
BUS_ID=$(nvidia-smi --query-gpu=pci.bus_id -i $LOCAL_RANK --format=csv,noheader)
BUS_ID=${BUS_ID,,}
# Find the numa node for device LOCAL_RANK
NODE=$(cat /sys/bus/pci/devices/${BUS_ID:4}/numa_node)
echo "Starting local rank $RANK on numa node $NODE"
numactl --cpunodebind=$NODE --membind=$NODE "$@" And the command: torchrun --nproc_per_node=8 --monitor-interval=1 --no-python ./trampoline.sh python3 -c "print('hello')" So torchrun will execute Or, with assuming environment is setup properly, we are running: torchrun \
--nproc_per_node=8 \
--monitor-interval=1 \
--no-python \
numactl \
--cpunodebind=$NODE \
--membind=$NODE \
python3 -c "print('hello')" Is this a valid understanding of what we have going on? |
Tbh though, the |
That looks correct. the and you need to check if
|
It is not, looks like we'll need to do it the hard way without pynvml (and just run a series of bash things) given that. |
but surely what bash is doing can be done in python and python has the numa API functionality - which is the pynvml script w/o the pynvml code in it. So I think it'd be much cleaner and user-friendlier not to do it at CLI level. |
re-checked, it's NVIDIA only:
|
No worries, while un-fun, I'm getting it working with some subprocess ;) |
@stas00 if you want to try some bleeding edge stuff, just pushed some commits. Haven't fully tested it on a multi-gpu system yet, but at least the dry run of the commands looks like everything should have been setup properly: pip install git+https://github.com/huggingface/accelerate@affinity To use: accelerate launch --multi_gpu --num_processes 8 --enable_numa_affinity myscript.py --arg1=1 ... I'll be able to fully test in the AM :) (and enable it via config file, etc, etc) |
hmm, I wasn't paying attention to the bash callouts - but looking now closely it's still nvidia-dependent because of So the subprocess callout is probably simpler anyway, than making |
Let's start small with the nvidia version, then we can add the AMD and gaudi2 as follow ups. (Since we can only test the nvidia-smi version rn) |
Feature request
As explained here pytorch/pytorch#115305 when using 2-cpu nodes it's important to get the NUMA affinities right to avoid cross NUMA node talk
As torchrun currently doesn't support it a workaround was posted here pytorch/pytorch#115305 (comment) and it includes a torchrun flag
--no-python
which the accelerate launcher doesn't have.So any suggestions to how I could use this script with
accelerate
?For simplicity here is the solution for torchrun:
update: I shared @yifuwang's workaround at https://twitter.com/StasBekman/status/1734724979496018215
If pynvml dependency is OK someone posted a python solution:
https://github.com/NVIDIA/DeepLearningExamples/blob/9dd9fcb98f56187e49c5ee280cf8dbd530dde57b/TensorFlow2/LanguageModeling/BERT/gpu_affinity.py
so that would probably be easier to integrate into the launcher.
Thanks.
The text was updated successfully, but these errors were encountered: