Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 6 additions & 13 deletions docs/Documentation/Systems/Gila/filesystem.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,26 @@
*Swift layout as an example*

# Gila Filesystem Architecture Overview

## Home Directories: /home

*/home directories are mounted as `/home/<username>`. Home directories are hosted under the user's initial /project directory. Quotas in /home are included as a part of the quota of that project's storage allocation*

## Project Storage: /projects

*Each active project is granted a subdirectory under `/projects/<projectname>`. This is where the bulk of data is expected to be, and where jobs should generally be run from. Storage quotas are based on the allocation award.*

*Quota usage can be viewed at any time by issuing a `cd` command into the project directory, and using the `df -h` command to view total, used, and remaining available space for the mounted project directory*

## Home Directories: /home

*/home directories are mounted as `/home/<username>`. Home directories are hosted under the user's initial /project directory. Quotas in /home are included as a part of the quota of that project's storage allocation*

## Scratch Space: /scratch/username and /scratch/username/jobid

*For users who also have Kestrel allocations, please be aware that scratch space on Swift behaves differently, so adjustments to job scripts may be necessary.*
## Scratch Storage: /scratch/username and /scratch/username/jobid

*The scratch directory on each Swift compute node is a 1.8TB spinning disk, and is accessible only on that node. The default writable path for scratch use is `/scratch/<username>`. There is no global, network-accessible `/scratch` space. `/projects` and `/home` are both network-accessible, and may be used as /scratch-style working space instead.*
*For users who also have Kestrel allocations, please be aware that scratch space on Gila behaves differently, so adjustments to job scripts may be necessary.*

*The scratch filesystem on Gila compute node is a 79TB spinning disk Ceph filesystem, and is accessible from login and compute nodes. The default writable path for scratch use is `/scratch/<username>`.*

## Temporary space: $TMPDIR

*When a job starts, the environment variable `$TMPDIR` is set to `/scratch/<username>/<jobid>` for the duration of the job. This is temporary space only, and should be purged when your job is complete. Please be sure to use this path instead of /tmp for your tempfiles.*

There is no expectation of data longevity in scratch space, and it is subject to purging once the node is idle. If desired data is stored here during the job, please be sure to copy it to a /projects directory as part of the job script before the job finishes.
There is no expectation of data longevity in the temporary space, and is purged once a job has completed. If desired data is stored here during the job, please be sure to copy it to a /projects directory as part of the job script before the job finishes.

## Mass Storage System

Expand All @@ -34,5 +29,3 @@ There is no Mass Storage System for deep archive storage on Gila.
## Backups and Snapshots

There are no backups or snapshots of data on Gila. Though the system is protected from hardware failure by multiple layers of redundancy, please keep regular backups of important data on Gila, and consider using a Version Control System (such as Git) for important code.


18 changes: 6 additions & 12 deletions docs/Documentation/Systems/Gila/index.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,36 @@

# About Gila

Gila is an OpenHPC-based cluster running on <Dual AMD EPYC 7532 Rome CPUs and nVidia A100 GPUs>. The nodes run as virtual machines in a local virtual private cloud (OpenStack). Gila is allocated for NREL workloads and intended for LDRD, SPP or Office of Science workloads. Allocation decisions are made by the IACAC through the annual allocation request process. Check back regularly as the configuration and capabilities for Gila are augmented over time.
Gila is an OpenHPC-based cluster running on __Dual AMD EPYC 7532 Rome CPUs__ and __Intel Xeon Icelake CPUs with NVidia A100 GPUs__. The nodes run as virtual machines in a local virtual private cloud (OpenStack). Gila is allocated for NREL workloads and intended for LDRD, SPP or Office of Science workloads. Allocation decisions are made by the IACAC through the annual allocation request process. Check back regularly as the configuration and capabilities for Gila are augmented over time.

## Accessing Gila
Access to Gila requires an NREL HPC account and permission to join an existing allocation. Please see the [System Access](https://www.nrel.gov/hpc/system-access.html) page for more information on accounts and allocations.

*Need to update*
#### For NREL Employees:
To access Gila, log into the NREL network and connect via ssh:

ssh <vs.hpc.nrel.gov>
ssh <vermilion.hpc.nrel.gov>
ssh gila.hpc.nrel.gov

#### For External Collaborators:
*Is this still true?*
There are currently no external-facing login nodes for Gila. There are two options to connect:

1. Connect to the [SSH gateway host](https://www.nrel.gov/hpc/ssh-gateway-connection.html) and log in with your username, password, and OTP code. Once connected, ssh to the login nodes as above.
1. Connect to the [HPC VPN](https://www.nrel.gov/hpc/vpn-connection.html) and ssh to the login nodes as above.

*Need to update*
There are currently two login nodes. They share the same home directory so work done on one will appear on the other. They are:

vs-login-1
vs-login-2
gila-login-1
gila-login-2

You may connect directly to a login node, but they may be cycled in and out of the pool. If a node is unavailable, try connecting to another login node or the <>`vs.hpc.nrel.gov`> round-robin option.
You may connect directly to a login node, but they may be cycled in and out of the pool. If a node is unavailable, try connecting to another login node or the `gila.hpc.nrel.gov` round-robin option.

## Get Help with Gila

Please see the [Help and Support Page](../../help.md) for further information on how to seek assistance with Gila or your NREL HPC account.

## Building code

*Need to review*
Don't build or run code on a login node. Login nodes have limited CPU and memory available. Use a compute or GPU node instead. Simply start an interactive job on an appropriately provisioned node and partition for your work and do your builds there. Similarly, build your projects under `/projects/your_project_name/` as home directories are **limited to 5GB** per user.
Do not build or run code on login nodes. Login nodes have limited CPU and memory available. Use a compute or GPU node instead. Simply start an interactive job on an appropriately provisioned node and partition for your work and do your builds there. Similarly, build your projects under `/projects/your_project_name/` as home directories are **limited to 5GB** per user.


---

22 changes: 10 additions & 12 deletions docs/Documentation/Systems/Gila/running.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,24 @@

### Compute hosts

Gila is a collection of physical nodes with each regular node containing <Dual AMD EPYC 7532 Rome CPUs>. However, each node is virtualized. That is it is split up into virtual nodes with each virtual node having a portion of the cores and memory of the physical node. Similar virtual nodes are then assigned slurm partitions as shown below.
Compute nodes in Gila are virtualized nodes running on either __Dual AMD EPYC Milan CPUs__ or __Intel Xeon Icelake CPUs__. These nodes are not configured as exclusive and can be shared by multiple users or jobs.

### GPU hosts

GPU nodes available in Gila have NVidia A100 GPU's running on __Intel Xeon Icelake CPUs__.

*Move this info to filesystems page?*
### Shared file systems

Gila's home directories are shared across all nodes. Each user has a quota of 5 GB. There is also /scratch/$USER and /projects spaces seen across all nodes.

*Need to update*
### Partitions

Partitions are flexible and fluid on Gila. A list of partitions can be found by running the `sinfo` command. Here are the partitions as of 3/27/2025.
A list of partitions can be found by running the `sinfo` command. Here are the partitions as of 10/16/2025

| Partition Name | Qty | RAM | Cores/node | /var/scratch <br>1K-blocks | AU Charge Factor |
| :--: | :--: | :--: | :--: | :--: | :--: |
| gpu<br>*1 x NVIDIA Tesla A100* | 16 | 114 GB | 30 | 6,240,805,336| 12 |
| lg | 39 | 229 GB | 60 | 1,031,070,000| 7 |
| std | 60 | 114 GB | 30 | 515,010,816| 3.5 |
| sm | 28 | 61 GB | 16 | 256,981,000| 0.875 |
| t | 15 | 16 GB | 4 | 61,665,000| 0.4375 |
| Partition Name | CPU | Qty | RAM | Cores/node | /var/scratch <br>1K-blocks | AU Charge Factor |
| :--: | :--: | :--: | :--: | :--: | :--: | :--: |
| gpu<br>*NVIDIA Tesla A100-40*<br> | Intel Xeon Icelake | 1 | 910 GB | 42 | 6,240,805,336| 12 |
| cpu-amd | AMD Epyc Milan | 36 | 220 GB | 120 | 1,031,070,000| 7 |

### Allocation Unit (AU) Charges

Expand All @@ -42,7 +40,7 @@ The **Charge Factor** for each partition is listed in the table above.

### Operating Software

The Gila HPC cluster runs fairly current versions of OpenHPC and SLURM on top of OpenStack.
The Gila HPC cluster runs on Rocky Linux 9.5.

<!-- Docs from Vermilion page: -->
<!-- ## Examples: Build and run simple applications
Expand Down