Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-node parallelism with number_of_workers #193

Open
jennydaman opened this issue Jan 25, 2022 · 0 comments
Open

Multi-node parallelism with number_of_workers #193

jennydaman opened this issue Jan 25, 2022 · 0 comments

Comments

@jennydaman
Copy link
Collaborator

jennydaman commented Jan 25, 2022

number_of_workers can be a way to support embarrassingly parallel jobs on multi-node compute environments.

How can a process identify which replicate it is? It is necessary to know so the workfload can be divided, e.g. in plugin code:

if WORKER_NUMBER == 1:
    process('1.png')
elif WORKER_NUMBER == 2:
    process ('2.png')
....

The equivalent concept in SLURM is a job array.

https://slurm.schedmd.com/job_array.html

e.g.

sbatch --job-array=1-4 job.sh

Four instances of job.sh will be executed, possibly on different compute nodes, and each instance will have an environment variable set SLURM_ARRAY_JOB_ID as 1, 2, 3, or 4.

pman should do something similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant