Skip to content

Latest commit

 

History

History
182 lines (148 loc) · 6.7 KB

slurm-usage.org

File metadata and controls

182 lines (148 loc) · 6.7 KB

Using SLURM

Table of contents

Available partitions

The scheduler currently has two partitions, which are meant for different purposes.

PartitionDescriptionTime limitDefault Time limit
sharedFor everyday jobs, testing and prototypes2-08:00:00n/a
longrunningFor long running jobs14-01:00:0004:00

Time limits

The partitions have different time restrictions on jobs, shared has a maximum time limit of 2 days and 8 hours, while longrunning has a time limit of a little more than 14 days. You can specify how long you want your job to be able to run using the time option (-t or --time). By default the time limit on longrunning is 4 hours.

“Fat” jobs

If a job you are submitting uses a large number of CPUs for processing some data, you should consider whether you have data parallelism. Such jobs should use smaller array jobs. For example:

Generate some faux data:

for i in {1..100}; do echo $i >> data.txt; done

Then, create a batch script named split_data_100.sbatch for splitting:

#!/bin/bash
#SBATCH --job-name=split_data_100
#SBATCH -p shared
#SBATCH --mem-per-cpu=4G
#SBATCH --time=00:30:00
#SBATCH --output=split_data_100_%J.log  # %J is job ID
sleep 30
split --numeric-suffixes=1 -n100 --additional-suffix=.txt data.txt data.

And submit the batch script:

sbatch split_data_100.sbatch
Submitted batch job 854907

For real data this might take some time, and you don’t want to wait, so you can submit the next job with a dependency on it. Call this next one example_array_job.sbatch:

#!/bin/bash
#SBATCH --job-name=example_array_job
#SBATCH --mem-per-cpu=1G
#SBATCH --time=00:10:00  # 10 min timelimit, setting a short timelimit decreases wait time in the queue
#SBATCH --output=example_array_job_log.%A_%a.log  # %a is array index, %A is job ID
#SBATCH --array=1-100%5  # 100 array task, max 5 running concurrently (i.e. limits IO)
J=$(printf "%03d" $SLURM_ARRAY_TASK_ID)
# python my_process_data_script.py data.${J}.txt > processed_data.${J}.txt

sleep 20
cat data.${J}.txt >processed_data.${J}.txt

And submit it with a dependency on the first one:

split_job_id=$(squeue --noheader --format=%i --name=split_data_100)
sbatch --dependency=afterok:${split_job_id} example_array_job.sbatch
Submitted batch job 854908

Check to see what the queue looks like:

squeue
            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
 854908_[1-100%5]    shared example_  rkjaran PD       0:00      1 (Dependency)
           854907    shared split_da  rkjaran  R       0:12      1 terra

Then check again once the split job has finished:

squeue
            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
 854908_[6-100%5]    shared example_  rkjaran PD       0:00      1 (JobArrayTaskLimit)
         854908_1    shared example_  rkjaran  R       0:17      1 terra
         854908_2    shared example_  rkjaran  R       0:17      1 terra
         854908_3    shared example_  rkjaran  R       0:17      1 terra
         854908_4    shared example_  rkjaran  R       0:17      1 terra
         854908_5    shared example_  rkjaran  R       0:17      1 terra

Using GPUs

In SLURM-land GPUs are a GRES (Generic RESource). A job will not be allocated a GRES unless requested with the --gres option to sbatch or srun. A GRES resource specifier has the format name[:type[:count]]. For example, to request a single GPU of any type for an interactive job:

srun -p shared --gres=gpu:1 --pty /bin/bash

For a batch job, a SLURM directive is sometimes a better choice:

#!/bin/bash
#SBATCH -p shared
#SBATCH --gres=gpu:1
#SBATCH --mem=11G
#SBATCH --output=cool-model-log.log

python train-my-cool-model.py

Which can schedule like so:

sbatch my-gpu-slurm-job.sbatch

Using sinfo we can discover what GPUs are available:

sinfo -O partition,nodelist,gres:30
PARTITION           NODELIST            GRES                          
shared*             gaia                (null)                        
shared*             terra               gpu:titanx:2,gpu:gtx1080ti:4  
shared*             torpaq              gpu:rtx2080ti:4               
longrunning         gaia                (null)                        
longrunning         torpaq              gpu:rtx2080ti:4               
login               gaia                (null)                        

And we can now request a specific type of GPU. For our interactive job we want two RTX 2080Ti GPUs:

srun -p shared --gres=gpu:rtx2080ti:1 --pty /bin/bash

Using Kaldi with SLURM

Kaldi comes with a SLURM wrapper utils/slurm.pl which can be used as the cmd script. Put the following in conf/slurm.conf:

command sbatch --export=PATH  --ntasks-per-node=1
option time=* --time $0
option mem=* --mem-per-cpu $0
option mem=0            # Do not add anything to qsub_opts
option num_threads=* --cpus-per-task $0 --ntasks-per-node=1
option num_threads=1 --cpus-per-task 1 --ntasks-per-node=1 
default gpu=0
option gpu=0
option gpu=* --gres=gpu:$0  # This has to be figured out
# note: the --max-jobs-run option is supported as a special case
# by slurm.pl and you don't have to handle it in the config file.

and the following in cmd.sh (or something similar):

export train_cmd="utils/slurm.pl --mem 6G --time 05:00:00"
export decode_cmd="utils/slurm.pl --mem 4G"
export mkgraph_cmd="utils/slurm.pl --mem 4G"
export big_memory_cmd="utils/slurm.pl --mem 8G"
export cuda_cmd="utils/slurm.pl --gpu 1"