Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running jobs: task affinity #48

Open
SteVwonder opened this issue Jul 22, 2020 · 3 comments
Open

Running jobs: task affinity #48

SteVwonder opened this issue Jul 22, 2020 · 3 comments

Comments

@SteVwonder
Copy link
Member

Flux will automatically set the CPU affinity and set CUDA_VISIBLE_DEVICES based on the cores and GPUs allocated to a job. If you are launching multiple tasks in a job, then you may be interested in the shell options “cpu-affinity” and “gpu-affinity”.

If you launch 2 tasks with flux mini run -n2 -N1 or flux mini run -n2 -N1 -o cpu-affinity=on -o gpu-affinity=on, both tasks/processes will see the same 2 cores and GPUs. If you launch 2 tasks with flux mini run -n2 -o cpu-affinity=per-task -o gpu-affinity=per-task, then each task will only see its own unique core and GPU. If you launch 2 tasks with flux mini run -n2 -o cpu-affinity=off -o gpu-affinity=off, then each task/process will see everything on the entire node.

Note: You can easily test and inspect the effects of various affinity policies using lstopo --restrict binding as the job task (e.g., flux mini run -n2 -N -o cpu-affinity=per-task lstopo --restrict binding).

@rcarson3
Copy link

@dongahn and others would it be possible to get this added to the CORAL 1 documentation. Specifically that flux mini run -n2 -o cpu-affinity=per-task -o gpu-affinity=per-task is the equivalent to jsrun -n 2 -r 2 -c 1 -g 1. I once again ran into that slowdown that we were originally seeing back in November for my use case while using the newer version of flux on Summit. The reason is my old flux invocation which looked something like flux mini run -n2 gpu-affinity=per-task was now no longer having each mpi rank see only 1 unique core and gpu. I was able to fix this just by flux mini run -n2 -o cpu-affinity=per-task -o gpu-affinity=per-task instead.

@dongahn
Copy link
Member

dongahn commented Oct 29, 2021

@rcarson3:

Yes! This is a common gotcha that Flux users on CORAL systems reported. We will document cpu-affinity=per-task -o gpu-affinity=per-task to our CORAL1 section. Sorry that we didn't have this earlier.

@dongahn
Copy link
Member

dongahn commented Oct 30, 2021

PR #111 is just posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants