A comprehensive guide to using the DGX A100 systems for authorized users.
The Data Science Institute (DSI) has four DGX A100 systems, now integrated into ACCRE's cluster. Access is provided to participants of DSI projects or those awarded a DSI Compute Grant for DGX.
The DSI maintains 4 DGX A100 systems, available in two configurations:
- 8 x 40GB A100 GPUs (2 Machines, up to 320GB total GPU RAM each).
- 8 x 80GB A100 GPUs (2 Machines, up to 640GB total GPU RAM each).
These machines are interconnected via InfiniBand for multi-GPU and multi-node High-Performance Computing (HPC). Access to the GPUs is allocated on a first-come, first-served basis.
- The DGX systems are shared among DSI Graduate Students, Faculty, Staff, and affiliated lab groups.
- Job and resource management is handled via SLURM, a dynamic job scheduler.
- High-demand periods or large resource requests may increase wait times.
- All work is saved to your ACCRE home directory.
- Upon logging in, you will start in your ACCRE home directory. If you require additional storage, please reach out to us or ACCRE for a custom solution
- Custom Singularity containers are supported.
- Docker is not available due to security concerns.
For details on access methods, see Accessing the DGXs.
To use the DGX systems, you must request an account by completing the DSI Compute Grant for DGX. If you have recieved an email stating you've been provisioned access, you do not need to complete this form.
There are four primary methods to access the DGX systems:
- Jupyter Notebooks
- ACCRE GPU Desktop
salloc
- SLURM batch jobs
The slurm_resources
command will show you what resources you can use. Under Account
, you should see dsi_dgx
as an option unless you have a different research group that has been provisioned with DGX Access. If you are a DSI student, you should also see p_dsi
. Please reach out to Umang Chaudhry if you do not see either of these accounts.
-
Scroll to section regarding Accounts and QOS for accessing the interactive GPU partition.
- You should see
dsi_dgx_iacc
under Accounts anddgx_iacc
underQOS
. Make a note of these two values as you will need them request resources.
- You should see
Jupyter Notebooks offer a straightforward way to access GPUs, though this method is limited to notebook-based workflows. For custom applications or containers, consider using the salloc
method.
- Visit the ACCRE Visualization Portal: http://viz.accre.vu. Log in with your VUnetID and password.
- Select Interactive Apps.
- Choose ACCRE JupyterLab.
- Provide the duration of your session in hours
- Provide your ACCRE SLURM account (
dsi_dgx_iacc
) - Select
interactive_gpu (GPU accelerated nodes, ready on-demand)
as the Partition - Provide your QOS (Quality of Service) designation (
dgx_iacc
) - Optionally provide the memory and number of CPU cores you require. If nothing provided, it will default to the specifications of the GPU you request
- Specify required GPU type -
Nvidia A 100-SXM4 (DGX 80 GB)
orNvidia A 100-SXM4 (DGX 40 GB)
- Provide the number of GPUs you require
- If using a custom virtual environment or container, provide the necessary information under Advanced Options
- Launch the session. Your session will queue and begin based on resource availability.

ACCRE GPU Desktop offers a virtual desktop environment for interactive GPU workflows.
- Visit the ACCRE Visualization Portal: http://viz.accre.vu. Log in with your VUnetID and password.
- Select Interactive Apps.
- Choose ACCRE GPU Desktop.
- Provide the duration of your session in hours
- Provide your ACCRE SLURM account (
dsi_dgx_iacc
) - Select
interactive_gpu (GPU accelerated nodes, ready on-demand)
as the Partition - Provide your QOS (Quality of Service) designation (
dgx_iacc
) - Optionally provide the memory and number of CPU cores you require. If nothing provided, it will default to the specifications of the GPU you request
- Specify required GPU type -
Nvidia A 100-SXM4 (DGX 80 GB)
orNvidia A 100-SXM4 (DGX 40 GB)
- Provide the number of GPUs you require
- Optionally, provide a custom screen resolution.
- Launch the session. Your session will queue and start based on availability.

The salloc
method provides direct shell access to the DGX systems and is ideal for running custom applications or workflows.
- Open a terminal and run:
ssh <VUnetID>@login.accre.vu
- Enter your VUnetID password.
- Navigate your ACCRE home directory using
ls
. - Request a direct shell into the DGX system with the following command:
salloc --time=1:00:00 --partition=interactive_gpu --account=dsi_dgx_iacc --qos=dgx_iacc --gres=gpu:nvidia_a100-sxm4-40gb:1
- For 80GB GPUs, use:
--gres=gpu:nvidia_a100-sxm4-80gb:1
- Adjust the time and GPU count as needed.
- For 80GB GPUs, use:
- Use
nvidia-smi
to verify your resources. - Launch your workflows using Singularity containers.
Running Jupyter notebooks from within a container requires a few extra steps due to the need for port forwarding:
- Open a terminal and run:
ssh <VUnetID>@login.accre.vu
- Enter your VUnetID password.
- Navigate your ACCRE home directory using
ls
. - Request a direct shell into the DGX system with the following command:
salloc --time=1:00:00 --partition=interactive --account=dsi_dgx_iacc --qos=dgx_iacc --gres=gpu:nvidia_a100-sxm4-40gb:1
- Make note of the machine you landed on (dgx01, dgx02, dgx03, or dgx04)
- Navigate to the location of your singularity container
- Run the following command:
singularity exec --nv --bind /home/vuNetID:/home/vuNetID pytorch_25.01-py3.sif jupyter-lab --notebook-dir=/home/vuNetID --ip=0.0.0.0 --no-browser
. This starts a Jupyter Lab session with the workspace bound to your home directory. You can modify this to work off of any directory of your choice. Ensure you have Read, Write and Execute access to this directory. - Open a NEW terminal window. Keep your previous terminal open and running.
- Run the following:
ssh [email protected] -L 8888:<dgx03>:8888
(Replace dgx03 with your machine from step 5) - Now, copy the link provided to you by the Jupyter session running on the first terminal window.
- Open a browser and paste the link. CHANGE the "hostname" in the link to "localhost". See example below:
Original Link:
http://hostname:8888/lab?token=8f89a890e5b48ad3a4e08058f7843f0d76e777cbd158071e
New Link:http://localhost:8888/lab?token=8f89a890e5b48ad3a4e08058f7843f0d76e777cbd158071e

SLURM is recommended for high-compute jobs such as model training. Use batch jobs to manage workloads efficiently.
-
Open a terminal and run:
ssh <VUnetID>@login.accre.vu
-
Enter your VUnetID password.
-
Prepare a Python script and upload it to your ACCRE home directory.
-
Create a SLURM script (e.g.,
filename.slurm
) with the following example:#!/bin/bash #SBATCH --job-name=stress_test # Job name #SBATCH --output=stress_test.log # Standard output log file #SBATCH --error=stress_test.log # Standard error log file #SBATCH --partition=interactive # Partition #SBATCH --account=dsi_dgx_iacc #SBATCH --qos=dgx_iacc #SBATCH --gres=gpu:1 # Request 1 GPU #SBATCH --time=3-00:00:00 # Time limit (hh:mm:ss) #SBATCH --nodes=1 # Number of nodes #SBATCH --ntasks=1 # Number of tasks #SBATCH --cpus-per-task=6 #SBATCH --mem=80GB # Memory per node # Load Singularity module if required by your cluster #module load singularity # Define the Singularity container path CONTAINER_PATH="/data/p_dsi/singularity-containers/pytorch_25.01-py3.sif" #singularity shell $CONTAINER_PATH # Execute the code using the Singularity container singularity exec --nv $CONTAINER_PATH python /home/vuNetID/stress-test.py
-
Submit your batch job:
sbatch filename.slurm
-
Monitor your job status:
squeue --job <job id>
For additional details, refer to the ACCRE Wiki and ACCRE SLURM Training.
For questions, contact Umang Chaudhry via email or the Vanderbilt Data Science Slack. If you are able to spin up a session but specifically your code does not work, please put in an ACCRE helpdesk ticket.