Skip to content

Wheeler

Guido Da Re edited this page Feb 23, 2024 · 43 revisions

This page contains general info for wheeler.caltech.edu. The full SLURM documentation can be found here: https://slurm.schedmd.com/documentation.html

SSH RSA Keys

Newer version of OpenSSH have reduced support for RSA keys and this makes Wheeler unhappy. If you are having trouble SSHing in, please make sure you have

PubkeyAcceptedKeyTypes ssh-rsa

set in your ~/.ssh/config for Wheeler, or that you are using a newer ed25519 key.

Compute Nodes

Compute nodes on Wheeler have 24 physical cores and 64GB of RAM.

⚠️ Some wheeler nodes do not work properly. It is not clear why.

Certain nodes have been found to never execute MPI code. These nodes are (as of January 9th, 2023)

  • wheeler061 (But worked fine for FLASH on June 23, 2023)
  • wheeler063
  • wheeler099
  • wheeler105 (But worked fine for SpEC on June 23, 2023)
  • wheeler110 (But worked fine for SpEC on June 23, 2023)
  • wheeler126 (But worked fine for SpEC on June 23, 2023)

Also, I (Kyle Nelli) just ran an 8-node job on Wheeler that was considerably slower than an identical job run on a different 8 nodes. The offending 8 nodes were wheeler[017-021,101-103]. Not sure which of these are the bad ones though. (January 19th, 2023)

You can avoid bad nodes by, e.g. #SBATCH --exclude=wheeler061,wheeler063 in your batch script. The syntax #SBATCH --exclude=wheeler[061-063] also works and will ignore nodes 061, 062, and 063.

Mitigating some bad nodes

  • I (Mark) put wheeler099 into a DOWN state on Jun 23, 2023. (Lets see if it stays that way) (Of course this doesn't fix the problem)

Queues

Jobs are sorted into one of two queues: the default productionQ with a time limit of 24 hrs, and debug with a time limit of 2 hrs. There is no limit on number of cores, beyond system-wide limits.

Job Manager

Wheeler uses SLURM instead of PBS, so the job & queue commands are different. For a full list of commands, see the SLURM documentation. Below is a toolbox of commands you will likely use frequently:

sbatch <job script>          # submit a batch job
scancel <job id>             # cancel a job
squeue --start               # show estimated start times
squeue -u <username>         # query current jobs
scontrol show job <job id>   # query details on a job
sinfo -a                     # query queue and node statuses

All of these SLURM commands have help flags and man pages: e.g. to print a summary of options for sinfo use sinfo -h and to examine the manual use man sinfo.

Interactive Jobs

In addition, to submit an interactive job to the debug queue, use one of the following commands (see warning ⚠️ below)

srun -p debug -n <number of cores> -t <time in minutes> --pty /bin/bash

or

srun -p debug -N <number of nodes> -c <number of cores per node> -t <time in minutes> --pty /bin/bash

⚠️ For running SpEC, which assumes one mpi rank per core, use the -n option. For running SpECTRE, which assumes one mpi rank per node, use the -c option. Using the wrong option can result in MPI hangs. ⚠️

You can change the number of processes used on each node by defining --ntasks-per-node. Note that <number of cores> does NOT need to be a multiple of 24, i.e. you can request a fraction of a node, and the time limit cannot exceed 120 minutes for the debug queue.

In order to run an OpenMPI-dependent executable, you might need to

export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so

and you need to prepend srun in front of the executable you run. You shouldn't need this for Intel MPI.

Finally, while qsub -I -q debug does work, when the job starts you will be placed on the head node and then must SSH into the compute node allocated. We do not describe how to do this here because the srun approach is easier and less error-prone.

Jupyter Notebooks

Jupyter notebooks can be launched on Wheeler and then viewed locally on one's desktop via the following.

On Wheeler, ssh into a compute node (e.g., srun -p debug -n 1 -t 120 --pty /bin/bash), and then run:

jupyter notebook --no-browser --port=<PORT> --ip=0.0.0.0

where <PORT> is the port you would like to be on, e.g., 8888.

For terminal and webpage

Locally, run:

ssh -NfL <PORT>:<NODE>:<PORT> <USERNAME>@wheeler.caltech.edu

where <NODE> is the compute node you ssh'd into on Wheeler, e.g., wheeler014. Finally, type localhost:<PORT> into your favorite browser. If it asks for a password, this is the token printed when running jupyter notebook on Wheeler.

For VSCode

Add the following to your ~/.ssh/config

Host wheelerjupyter
  HostName wheeler.caltech.edu
  ForwardAgent yes
  User <USERNAME>
  LocalForward localhost:<PORT> <NODE>:<PORT>

Then connect to Wheeler through VSCode using the new wheelerjupyter host. Alternatively, after you have changed your config file, locally run ssh -NfL <PORT>:<NODE>:<PORT> <USERNAME>@wheeler.caltech.edu and type localhost:<PORT> into your browser.

Parallel Bilby

For running parallel bilby on Wheeler, you need to have the following modules loaded:

gcc/9.3.0 impi/2017.1 python/3.8.7 

and should create a python venv (and activate it) before installing parallel bilby and its dependencies

python -m venv /path/to/python_venv
source /path/to/python_venv/bin/activate

If you don't do this, then MPI interoperablitiy issue will cause your multi-node jobs to run forever or crash.

gdb

Within a single-node interactive job, a multithreaded executable can be run under gdb using

srun <srun args> --pty gdb --args <executable> <executable args>

where the <srun args> are the same as what would be used to run without gdb.

Ganglia

Ganglia is an online tool to help monitor the status of compute nodes. It is currently not working on Wheeler, but may be brought back online if there is enough interest. The old link was Ganglia.

Email Notification

It is often convenient to be notified via email when your job finishes or is aborted. To do this, include the following in your submission script:

#SBATCH [email protected]
#SBATCH --mail-type=ALL

where [email protected] is your email address. This will notify you when the job starts, if it is aborted, and when it finishes.

Multiple Job Steps in a Single Job

Usually every job on wheeler reserves an integer number of nodes, where each node has 24 cores. So what do you do if you want to run a job that uses fewer than 24 cores? Please do not just run that job on a 24-core node without thinking; by default if you run (for example) a 4-core job on a 24-core node, then 20 cores will be doing nothing, and nobody else can use them (including other jobs owned by you). So here's what you do:

Method 1: request part of a node

Slurm on wheeler currently assumes that you use 2.3GB of memory per core. If you need more or less memory than that, #SBATCH --mem-per-cpu=2G is the option you need. If you specify --ntasks less than 24 (the number of cores on a wheeler node), then more than one job can run on a single node, as long as those jobs don't request more than 56GB in total (the amount of memory on a wheeler node).

Here's an example script that uses 12 cores (--ntasks-per-node 12) and uses --mem-per-cpu to allow more than one slurm job to run on the same node. If you run this same script twice (with two different executables), both jobs should end up on the same node.

#!/bin/bash -
#SBATCH -o SpEC.stdout
#SBATCH -e SpEC.stdout
#SBATCH --ntasks-per-node 12
#SBATCH -A sxs
#SBATCH --no-requeue
#SBATCH -J ID_delta_1_82_
#SBATCH --nodes 1
#SBATCH -t 01:00:00
#SBATCH --mem-per-cpu=2G

mpirun -np 12 MyExecutable >Output.out 2>&1

Method 2: Run multiple jobs in the same slurm script

Here is an example submit script that launches two 12-core job steps together:

#!/bin/bash -
#SBATCH -o SpEC.stdout
#SBATCH -e SpEC.stdout
#SBATCH --ntasks-per-node 24
#SBATCH -A sxs
#SBATCH --no-requeue
#SBATCH -J ID_delta_1_82_
#SBATCH --nodes 1
#SBATCH -t 01:00:00
#SBATCH --mem-per-cpu=2G

module purge
umask 0022
set -x
# load modules, etc.

# Note that the '&' is used to background each job step that
# launches MPI jobs. In this setup the jobs are not explicitly
# pinned to specific cores. You can set specific cores to run
# on by launching with
#  'srun --mpi=pmi2 -n 12 --cpu_bind=map_cpu:0,1,2,3,etc MyExecutable'
cd ./A0.075
mpirun -np 12 MyExecutable >Output.out 2>&1 &

cd ../A0.0755
mpirun -np 12 MyExecutable >Output.out 2>&1 &

# Wait for the backgrounded jobs to complete
wait

The wait command at the end of the submit script is important; without it the job would end before the backgrounded tasks completed.

In order to achieve good performance when running multiple executables on a single node they must all have their own dedicated cores. For MPI executables mpirun can pin executables to cores. For single or multithreaded applications taskset can be used. For SpECTRE or other Charm++-based executables (e.g. SpECTRE CCE) you can use

{spectre_build_dir}/bin/CharacteristicExtract ++ppn 1 +setcpuaffinity \
    +pemap some_number1 \
    +commap some_number2 2>&1 &

In this example CCE is run on 2 cores, 1 communication core and 1 worker core. The +pemap specifies which core(s) the worker(s) should use, while +commap specifies which core(s) the communication thread(s) should be placed on. See the Charm++ manual for more details.

ParaView for Visualizations

ParaView is installed for off-screen rendering using the OSMesa 17.3.3 backend. Load the ParaView module using module load paraview/5.10.1 (you'll need ParaView 5.10.1 also on your local machine) and if you want to run on multiple cores or nodes load the IMPI 2017.1 module using module load impi/2017.1. ParaView has a python interface, pvpython and a parallel python interface pvbatch. Documentation of the Python interface is unfortunately quite sparse, but ParaView has a tracing option that will record the python commands corresponding to what you are doing in the GUI. To start tracing use Tools->Start Trace in the ParaView GUI. This way you can locally set up a script working with a small data set and once you are happy run it scaled up on Wheeler. Here is an example ParaView python script used for SpECTRE:

from paraview.simple import *

paraview.simple._DisableFirstRenderCameraReset()

grmhdxmf = XDMFReader(
    FileNames=['/home/nils/nils/spectre/FishboneDiskCube/GrMhd.xmf'])

grmhdxmf.PointArrayStatus = ['ErrorRestMassDensity', 'RestMassDensity']

# get active source.
grmhdxmf = GetActiveSource()

# Properties modified on grmhdxmf
grmhdxmf.GridStatus = ['Evolution']

# create a new 'Slice'
slice1 = Slice(Input=grmhdxmf)
slice1.SliceType = 'Plane'
slice1.SliceOffsetValues = [0.0]

# init the 'Plane' selected for 'SliceType'
slice1.SliceType.Origin = [0.0, 11.0, 0.0]

# Properties modified on slice1.SliceType
slice1.SliceType.Normal = [0.0, 0.0, 1.0]

# Properties modified on slice1.SliceType
slice1.SliceType.Normal = [0.0, 0.0, 1.0]

# get active view
renderView1 = GetActiveViewOrCreate('RenderView')
renderView1.ViewSize = [1920, 1080]

# get color transfer function/color map for 'RestMassDensity'
restMassDensityLUT = GetColorTransferFunction('ErrorRestMassDensity')

# Rescale transfer function
restMassDensityLUT.RescaleTransferFunction(1e-12, 78.0)

# restMassDensityLUT.UseLogScale = 1

print("rendering...")
# show data in view
slice1Display = Show(slice1, renderView1)
print("Setting data and camera")
# trace defaults for the display properties.
slice1Display.Representation = 'Surface'
slice1Display.ColorArrayName = ['POINTS', 'ErrorRestMassDensity']
slice1Display.LookupTable = restMassDensityLUT
slice1Display.OSPRayScaleArray = 'ErrorRestMassDensity'
slice1Display.OSPRayScaleFunction = 'PiecewiseFunction'
slice1Display.SelectOrientationVectors = 'None'
slice1Display.ScaleFactor = 4.0
slice1Display.SelectScaleArray = 'ErrorRestMassDensity'
slice1Display.GlyphType = 'Arrow'
slice1Display.GlyphTableIndexArray = 'ErrorRestMassDensity'
slice1Display.DataAxesGrid = 'GridAxesRepresentation'
slice1Display.PolarAxes = 'PolarAxesRepresentation'
slice1Display.GaussianRadius = 2.0
slice1Display.SetScaleArray = ['POINTS', 'ErrorRestMassDensity']
slice1Display.ScaleTransferFunction = 'PiecewiseFunction'
slice1Display.OpacityArray = ['POINTS', 'ErrorRestMassDensity']
slice1Display.OpacityTransferFunction = 'PiecewiseFunction'

# show color bar/color legend
slice1Display.SetScalarBarVisibility(renderView1, True)

# hide data in view
Hide(grmhdxmf, renderView1)

# current camera placement for renderView1
renderView1.CameraPosition = [0.0, 11.0, 50]
renderView1.CameraFocalPoint = [0.0, 11.0, 0.0]
renderView1.CameraParallelScale = 23.345235059857504

# update the view to ensure updated data information
renderView1.Update()

WriteImage("./error.jpg", renderView1)
print("Image file written")

Since rendering can take quite a while print statements are used at specific points so the user receives some feedback. To launch the above pvpython script in parallel on 10 cores run:

module purge # Get rid of whatever modules you have loaded
module load paraview/5.10.1 impi/2017.1 # Load ParaView and OpenMPI
mpirun -n 10 pvbatch ./VisParaView.py

where VisParaView.py is the script name on disk. Please don't run in parallel on the login node!

For those interested in how ParaView was built, see the module files for OSMesa and ParaView:

/usr/local/Modules/modulefiles/visualization/osmesa/17.3.3
/usr/local/Modules/modulefiles/visualization/paraview/5.6.0

ParaView for Remote Visualizations

ParaView server has support for rendering data on Wheeler and sending the results to a local machine for viewing. Start pvserver in serial on the login node (pvserver does not currently work on the compute nodes), e.g. using pvserver. Now start a new SSH connection to Wheeler using:

ssh -L11111:wheeler:11111 wheeler

where the 11111 are the ports the ParaView server and client will use.

On your local machine open ParaView and select File->Connect.... Select Add Server, give the new server a name, for ServerType choose Client/Server, for Host use localhost, and for the port use 11111 (the port needs to match the first port specified in the ssh connection). Click Configure and set the Startup Type to Manual. Now click Save. In the future, to connect to Wheeler select the server you just created and click Connect. You can now open files on Wheeler through the ParaView GUI as if you were working locally on Wheeler. Keep in mind that there will be some delay due to internet connectivity and due to the amount of data you might be visualizing.

Disk Quotas

To find out your disk usage on Panasas filesystems (/panfs/ds09/sxs) run

/usr/local/adm/bin/fs_usage /panfs/ds09/sxs | grep `whoami`

You can figure out your quota using

/usr/local/bin/pan_quota

Note that you must be on /panfs somewhere for the command to succeed.

Globus for File Transfers

Globus (https://www.globus.org/) allows users to transfer files between various HPC systems and other local endpoints. In Globus terminology an endpoint is effectively one system or location where you can transfer data to and/or from. All XSEDE machines already have endpoints set up so please visit the XSEDE documentation for how to use Globus in that environment.

To transfer data to/from Wheeler, you must set up a "Globus Connect Personal" (GCP) endpoint on Wheeler. Follow the steps below:

  1. Sign in to the Globus web app, which you can (for example) do using your XSEDE credentials as detailed on the XSEDE portal https://portal.xsede.org/data-management. Then generate a setup key for the new GCP endpoint via the web interface. As of this writing (May 7, 2019) it is the first three steps in the Installation instructions https://docs.globus.org/how-to/globus-connect-personal-linux/.
  2. On wheeler, the GCP tools are provided in a module. The endpoint can be set up and started using the command line To set up your endpoint run
module load globus-personal/2.3.6
globusconnectpersonal -setup <YOUR_ENDPOINT_KEY>

where you must replace <YOUR_ENDPOINT_KEY> with the key generated in item 1 above. To start the Globus server run

globusconnectpersonal -start -restrict-paths "/panfs/ds09/sxs/<USERNAME>/,/home/<USERNAME>/" &

where you must replace <USERNAME> with the output of whoami.

At this point you should see Wheeler under the "Administered by You" tab in the Endpoints page of the Globus web app. If you click on Wheeler you should be able to browse your Wheeler files in the web view. Note that unless you disown the globusconnectpersonal process or run it in GNU screen you will need to remain logged into Wheeler during the transfer. You will get transfer speeds from 5-10MB/s, though at the start of the transfer it will be closer to 1MB/s.

After you are done transferring data you can stop the server with$ globusconnectpersonal -stop

Admins

To contact fellow admins email [email protected].

Add Temporary Queues/Partitions

Slurm calls queues "partitions". Partitions are assigned nodes, allowed groups/users, time limits, etc. To set up a temporary partition for only some users it is easiest to leave the existing partitions as they are and use a reservation to block off the desired nodes. What the overall procedure looks like is:

  1. Set up a new partition with the desired time limits, etc. (all users is fine here). Choose whichever nodes you want to have in the partition. You can check the queue for nodes that will be available soon and grab those.
  2. Set up a reservation on the same nodes specifying which users you want to be able to run on the reservation. Make sure that you set the flag IGNORE_JOBS, which tells Slurm not to kill any jobs currently using those nodes but to not allow anyone to allocate the nodes.
  3. Have users submit jobs to the new partition specifying both the partition and reservation. The flags for this are -p PARTITION_NAME --reservation RESERVATION_NAME. The reservation name will be the name of the first user of the reservation followed by _ followed by a number. If no users are specified when the reservation is created it will be named root_NUMBER.

Note: you can update the partition and reservation later in any way you want, you do not need to recreate it.

Now for the more detailed instructions.

  1. To create a new partition run sudo scontrol create partition PartitionName=PARTITION_NAME Default=no Nodes=LIST_OF_NODES MaxTime=MAX_TIME. The LIST_OF_NODES can be a comma separated list including a range, e.g. wheeler001,wheeler012,wheeler[013-025]. The MAX_TIME can be in the format day:hours:minutes:seconds or UNLIMITED. For more details see the scontrol documentation section on SPECIFICATIONS FOR CREATE, UPDATE, AND DELETE COMMANDS, PARTITIONS.
  2. Next we want to reserve the nodes so that only the users we want can run on them. This is also done using the scontrol command. To create the reservation use sudo scontrol create reservation Reservation=RESERVATION_NAME StartTime=HH:MM:SS Duration=HH:MM:SS Flags=SPEC_NODES,OVERLAP,IGNORE_JOBS Nodes=NODE_LIST. Instead of supplying a node list, the Slurm documentation says you can also add the flag PART_NODES and specify PartitionName=PARTITION_NAME to have the nodes associated with a specific partition control the nodes of the reservation.
  3. Finally, any user who wants to run on the new partition and reservation will need to set #SBATCH -p PARTITION_NAME and #SBATCH --reservation RESERVATION_NAME in the Slurm submit script.

Unusable Nodes in DRAIN State

When a node is put up for maintenance it gets put into a DRAIN state so that no new jobs are run on the node. Sometimes the state isn't cleared properly. To check the state of a node run scontrol show node NODE_NAME where NODE_NAME would be, for example, wheeler008. To check the reason why a node is in a DRAIN state run sinfo -R. To bring a node out of a DRAIN state run scontrol update NodeName=NODE_NAME State=UNDRAIN where the NODE_NAME would be, for example, wheeler008.

Job stuck in CG State/Node stuck in comp state

Sometimes jobs get stuck as they are completing and will hang in the CG state. The only solution I've found is SSHing to the compute node(s) and running systemctl restart slurmd. Note that this will take a few minutes to complete and your terminal will be "stuck" waiting.

Reviving DOWN nodes

Compute nodes can go down for a variety of reasons. If there is a hardware issue then someone on the Caltech HPC team must revive the node. However, often a node goes down because of some issue with the slurm daemon, slurmd on the node. There is slurm documentation on this at: https://slurm.schedmd.com/troubleshoot.html#nodes However, there are a few subtle differences. To restart the slurm daemon on the node you must run systemctl restart slurmd instead of the /etc/init.d/... in the slurm manual. It might take a while to restart the daemon (a minute or two), or you may need to kill the process manually and then run systemctl start slurmd. You can then log out of the compute node and set the node back to IDLE by running sudo /panfs/ds09/support/slurm/install/current/bin/scontrol update NodeName=wheeler019 State=IDLE Reason="Start node". The node will be in the IDLE* state, where the * means the node is "unreachable". This is because it might take a few (10-20) minutes for slurm to realize the node has returned. The node may even go back to a DOWN* for a bit before returning to service.

Config and log locations

The slurm conf file is located at /etc/slurm/slurm.conf. The CTLD log is located at /var/log/slurm/slurmctld.log. The daemon log file is located at /var/log/slurm/slurm.log. However, the log files seem to not always be written (Nils D. doesn't understand this).

Modules not found on compute node

The compute nodes (and login node) all should have /usr/local symlinked to /home/_SYS_/usr_local. To do this:

cd /usr; mv local local_ORIG; ln -s /home/_SYS_/usr_local local

Not having this correct will result in being unable to load modules.

panfs not mounted

The /etc/fstab should be something like:

#
# /etc/fstab
# Created by anaconda on Fri May 20 20:01:43 2022
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/vg00-lv_root /                       xfs     defaults        0 0
UUID=70322a03-e113-42c3-82ff-4dad195856b3 /boot                   ext4    defaults        1 2
/dev/mapper/vg00-lv_scratch /scratch                xfs     defaults        0 0
/dev/mapper/vg00-lv_tmp /tmp                    xfs     defaults        0 0
/dev/mapper/vg00-lv_var /var                    xfs     defaults        0 0
/dev/mapper/vg00-lv_varlog /var/log                xfs     defaults        0 0
/dev/mapper/vg00-lv_varspool /var/spool              xfs     defaults        0 0
/dev/mapper/vg00-lv_vartmp /var/tmp                xfs     defaults        0 0
UUID=21ae7bb6-56c8-45f1-b795-86aa69428fdd swap                    swap    defaults        0 0
#
tmpfs                                           /dev/shm                tmpfs   defaults                0 0
#
172.16.20.1:/home                               /home                   nfs     defaults                0 0
#
panfs://panasas-wheeler/Support         /panfs/ds09/support     panfs   rw,auto,_netdev,callback-network-allow=192.168.132.0/24,rmlist=(192.168.202.87;192.168.202.71;192.168.202.65)   0 0
panfs://panasas-wheeler/SXS             /panfs/ds09/sxs         panfs   rw,auto,_netdev,callback-network-allow=192.168.132.0/24,rmlist=(192.168.202.87;192.168.202.71;192.168.202.65)   0 0
panfs://panasas-wheeler/Hopkins         /panfs/ds09/hopkins     panfs   rw,auto,_netdev,callback-network-allow=192.168.132.0/24,rmlist=(192.168.202.87;192.168.202.71;192.168.202.65)   0 0
panfs://panasas-wheeler/Fuller          /panfs/ds09/fuller      panfs   rw,auto,_netdev,callback-network-allow=192.168.132.0/24,rmlist=(192.168.202.87;192.168.202.71;192.168.202.65)   0 0

You need tho right paths and IP addresses for panfs. Make sure the directory /panfs/ds09 exists. These should be directories, not symlinks.

for x in support sxs hopkins fuller;
do
    mkdir /pandfs/ds09/$x
    mount /panfs/ds09/$x
done

To be sure things are mounted on reboot, reboot the node.

Building Singularity

Singularity needs to be built as root in order for it to work properly, but also requires the Go compiler. To build Go follow the installation instructions in the Singularity dox. On Wheeler Go was installed inside /usr/local/go using

export VERSION=1.13.7 OS=linux ARCH=amd64
wget https://dl.google.com/go/go$VERSION.$OS-$ARCH.tar.gz
tar xzf go$VERSION.$OS-$ARCH.tar.gz

Note that Go does not need to be installed as root. The Go directory was then renamed to being $VERSION so that multiple compiler versions can be supported via modules. The module was set up in /usr/local/Modules/modulefiles/compilers/go/$VERSION. Only the bin directory needs to be appended to the path in the module file.

Singularity was built into /usr/local/singularity using

sudo su
export VERSION=3.5.2
wget https://github.com/sylabs/singularity/releases/download/v${VERSION}/singularity-${VERSION}.tar.gz
tar -xzf singularity-${VERSION}.tar.gz
cd ./singularity
./mconfig --prefix=/usr/local/singularity/${VERSION}
cd ./builddir
make
make install

Note that the sudo su at the beginning is necessary to set up Singularity correctly because it needs to be built as root. The Singularity module file is in /usr/local/Modules/modulefiles/tools/singularity/$VERSION. Only the bin directory needs to be appended to the path in the module file.

Mathematica

Mathematica can be run on wheeler by running module load mathematica/11.0 and then executing the command math. But you may need to put !mathematica.caltech.edu in your .Mathematica/Licensing/mathpass file to avoid activation key issues.

Address Sanitizer

The address sanitizer tries to allocate a huge amount of memory at the beginning of the process. It won't actually use all of it, but some of it. My guess is this is to ensure that it won't run out of memory during memory diagnostics. If vm.overcommit_memory is set to 2 then ASAN won't work. You can check this by running:

cat /proc/sys/vm/overcommit_memory

If this is set to 2, you won't be able to run ASAN. If you have root privileges (or know someone who does and have a good reason to get this temporarily changed) then the person with root privileges can sudo ssh into the nodes you want to use and run:

echo 0 > /proc/sys/vm/overcommit_memory

After the user is done with ASAN someone with root privileges should SSH back into the nodes and do

echo 2 > /proc/sys/vm/overcommit_memory

Jobs can request specific nodes using e.g. --nodelist=wheeler001,wheeler002.

Some info at: https://github.com/google/sanitizers/wiki/AddressSanitizer (search for overcommit)

Measuring MPI Performance

Depending on the exact MPI installation different hardware backends can be used, with extremely different performance characteristics. When testing an MPI installation, be it an existing module on a system or an installation you are doing, it is important to understand the performance characteristics of the MPI library. Ohio State University, who developed MPICH, has created a set of benchmarks to test the performance of an MPI library. The OSU microbenchmarks are available here. Below is a plot and the row data of latency and bandwidth measurements on Wheeler. The OpenMPI installation uses UCX, but UCX has some bug in it that prevents Charm++ from running properly. However, UCX is generally one of or the fastest layer to use.

WheelerLatency

WheelerBandwidth

# 0: Packet size
# 1: OpenMPI bandwidth (MB/s)
# 2: IntelMPI Bendwidth (MB/s)
# 3: OpenMPI latency (us)
# 4: IntelMPI latency (us)
#
# Theoretical max on Wheeler with
# Mellanox SX6025 FDR IB Switch (oPSE) is 7168 MB/s
#
# Switch details at:
# https://network.nvidia.com/related-docs/prod_ib_switch_systems/PB_SX6025.pdf
#
# Data obtained used OSU microbenchmarks:
# ulhpc-tutorials.readthedocs.io/en/latest/parallel/mpi/OSU_MicroBenchmarks/
#
# OpenMPI launch command:
#  mpirun -mca btl ^openib -mca pml ucx -x UCX_NET_DEVICES=mlx4_0:1 ...
#
# IntelMPI launch command:
#  mpiexec ...
1                       2.41     2.08    1.87    2.26
2                       5.09     4.09    1.80    2.25
4                      10.30     8.20    1.77    2.25
8                      20.51    16.99    1.77    2.25
16                     41.18    32.33    1.77    2.89
32                     81.40    63.46    1.78    2.89
64                    159.45   129.67    1.80    2.89
128                   315.03   255.52    1.87    2.93
256                   601.62   494.53    1.97    3.01
512                  1135.19   984.94    2.08    3.13
1024                 1944.95  1808.88    2.34    3.44
2048                 3102.21  2976.16    2.87    3.94
4096                 5274.20  4399.81    3.22    4.40
8192                 6083.76  5462.46    3.90    5.33
16384                6204.29  5607.61    5.31    6.90
32768                6309.14  5855.47    7.94    9.78
65536                6361.18  5932.48   13.21   15.32
131072               6368.93  5959.46   23.47   26.48
262144               5994.83  5209.34   43.75   46.89
524288               5927.50  5531.01   84.67   88.26
1048576              5891.17  5670.04  166.84  170.65
2097152              5878.14  5733.51  331.00  335.70
4194304              5853.38  5554.40  659.37  665.51

Changing user /panfs quotas

In order to be able to change user quotas, you must have permission to ssh admin@panasas-wheeler You will not actually ssh there explicitly; you will run commands on Wheeler and the ssh will happen in the background (kind of like git pull and git push from/to a remote). To change quotas:

  • Make sure /usr/local/adm/bin is in your $PATH on wheeler.
  • On wheeler, pull all the current quotas to a (previously nonexistent) file. Here we will name the file quotas_OLD:
wheeler> get_panasas_quotas quotas_OLD

The above command will print something like Successfully copied the limits file contents to /tmp/quotas. You can ignore that message: /tmp/quotas is a file on panasas-wheeler (not on wheeler) that is temporarily generated as part of the get_panasas_quotas command.

  • Copy the quotas file so you have a backup in case you do something wrong:
wheeler> cp quotas_OLD quotas_NEW
  • Edit quotas_NEW to change whatever quotas you want, using vi, emacs, etc. Each line in quotas_NEW looks like
user uid:17625 /SXS  1.8T 2T 0 0 [email protected] =

where columns 4 and 5 are the soft and hard quotas, and the email field is always <USERNAME>@hpc.caltech.edu and not the user's actual email. The soft quota should be 90% of the hard quota.

  • Push quotas_NEW to the server:
wheeler> set_panasas_quotas quotas_NEW
Clone this wiki locally