From 14bc094997fd7a5ea99949b8cdf1e27ad5c7a24c Mon Sep 17 00:00:00 2001 From: jonathancasco <4037484+jonathancasco@users.noreply.github.com> Date: Thu, 16 Oct 2025 13:03:53 -0600 Subject: [PATCH 1/3] Update filesystem.md --- docs/Documentation/Systems/Gila/filesystem.md | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/docs/Documentation/Systems/Gila/filesystem.md b/docs/Documentation/Systems/Gila/filesystem.md index ff3bf8df0..017987d0c 100644 --- a/docs/Documentation/Systems/Gila/filesystem.md +++ b/docs/Documentation/Systems/Gila/filesystem.md @@ -1,8 +1,8 @@ -*Swift layout as an example* - # Gila Filesystem Architecture Overview +## Home Directories: /home +*/home directories are mounted as `/home/`. Home directories are hosted under the user's initial /project directory. Quotas in /home are included as a part of the quota of that project's storage allocation* ## Project Storage: /projects @@ -10,22 +10,17 @@ *Quota usage can be viewed at any time by issuing a `cd` command into the project directory, and using the `df -h` command to view total, used, and remaining available space for the mounted project directory* -## Home Directories: /home - -*/home directories are mounted as `/home/`. Home directories are hosted under the user's initial /project directory. Quotas in /home are included as a part of the quota of that project's storage allocation* - -## Scratch Space: /scratch/username and /scratch/username/jobid - -*For users who also have Kestrel allocations, please be aware that scratch space on Swift behaves differently, so adjustments to job scripts may be necessary.* +## Scratch Storage: /scratch/username and /scratch/username/jobid -*The scratch directory on each Swift compute node is a 1.8TB spinning disk, and is accessible only on that node. The default writable path for scratch use is `/scratch/`. There is no global, network-accessible `/scratch` space. `/projects` and `/home` are both network-accessible, and may be used as /scratch-style working space instead.* +*For users who also have Kestrel allocations, please be aware that scratch space on Gila behaves differently, so adjustments to job scripts may be necessary.* +*The scratch filesystem on Gila compute node is a 79TB spinning disk Ceph filesystem, and is accessible from login and compute nodes. The default writable path for scratch use is `/scratch/`.* ## Temporary space: $TMPDIR *When a job starts, the environment variable `$TMPDIR` is set to `/scratch//` for the duration of the job. This is temporary space only, and should be purged when your job is complete. Please be sure to use this path instead of /tmp for your tempfiles.* -There is no expectation of data longevity in scratch space, and it is subject to purging once the node is idle. If desired data is stored here during the job, please be sure to copy it to a /projects directory as part of the job script before the job finishes. +There is no expectation of data longevity in the temporary space, and is purged once a job has completed. If desired data is stored here during the job, please be sure to copy it to a /projects directory as part of the job script before the job finishes. ## Mass Storage System @@ -34,5 +29,3 @@ There is no Mass Storage System for deep archive storage on Gila. ## Backups and Snapshots There are no backups or snapshots of data on Gila. Though the system is protected from hardware failure by multiple layers of redundancy, please keep regular backups of important data on Gila, and consider using a Version Control System (such as Git) for important code. - - From c7b60f5d0a88cda235a6fe5ddcfc66df72b23e0d Mon Sep 17 00:00:00 2001 From: jonathancasco <4037484+jonathancasco@users.noreply.github.com> Date: Thu, 16 Oct 2025 13:04:12 -0600 Subject: [PATCH 2/3] Update index.md --- docs/Documentation/Systems/Gila/index.md | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/docs/Documentation/Systems/Gila/index.md b/docs/Documentation/Systems/Gila/index.md index e95fe50ea..4a44a378c 100644 --- a/docs/Documentation/Systems/Gila/index.md +++ b/docs/Documentation/Systems/Gila/index.md @@ -1,32 +1,28 @@ # About Gila -Gila is an OpenHPC-based cluster running on . The nodes run as virtual machines in a local virtual private cloud (OpenStack). Gila is allocated for NREL workloads and intended for LDRD, SPP or Office of Science workloads. Allocation decisions are made by the IACAC through the annual allocation request process. Check back regularly as the configuration and capabilities for Gila are augmented over time. +Gila is an OpenHPC-based cluster running on __Dual AMD EPYC 7532 Rome CPUs__ and __Intel Xeon Icelake CPUs with NVidia A100 GPUs__. The nodes run as virtual machines in a local virtual private cloud (OpenStack). Gila is allocated for NREL workloads and intended for LDRD, SPP or Office of Science workloads. Allocation decisions are made by the IACAC through the annual allocation request process. Check back regularly as the configuration and capabilities for Gila are augmented over time. ## Accessing Gila Access to Gila requires an NREL HPC account and permission to join an existing allocation. Please see the [System Access](https://www.nrel.gov/hpc/system-access.html) page for more information on accounts and allocations. -*Need to update* #### For NREL Employees: To access Gila, log into the NREL network and connect via ssh: - ssh - ssh + ssh gila.hpc.nrel.gov #### For External Collaborators: -*Is this still true?* There are currently no external-facing login nodes for Gila. There are two options to connect: 1. Connect to the [SSH gateway host](https://www.nrel.gov/hpc/ssh-gateway-connection.html) and log in with your username, password, and OTP code. Once connected, ssh to the login nodes as above. 1. Connect to the [HPC VPN](https://www.nrel.gov/hpc/vpn-connection.html) and ssh to the login nodes as above. -*Need to update* There are currently two login nodes. They share the same home directory so work done on one will appear on the other. They are: - vs-login-1 - vs-login-2 + gila-login-1 + gila-login-2 -You may connect directly to a login node, but they may be cycled in and out of the pool. If a node is unavailable, try connecting to another login node or the <>`vs.hpc.nrel.gov`> round-robin option. +You may connect directly to a login node, but they may be cycled in and out of the pool. If a node is unavailable, try connecting to another login node or the `gila.hpc.nrel.gov` round-robin option. ## Get Help with Gila @@ -34,9 +30,7 @@ Please see the [Help and Support Page](../../help.md) for further information on ## Building code -*Need to review* -Don't build or run code on a login node. Login nodes have limited CPU and memory available. Use a compute or GPU node instead. Simply start an interactive job on an appropriately provisioned node and partition for your work and do your builds there. Similarly, build your projects under `/projects/your_project_name/` as home directories are **limited to 5GB** per user. +Do not build or run code on login nodes. Login nodes have limited CPU and memory available. Use a compute or GPU node instead. Simply start an interactive job on an appropriately provisioned node and partition for your work and do your builds there. Similarly, build your projects under `/projects/your_project_name/` as home directories are **limited to 5GB** per user. --- - From 3111ab840605d28d73a321cc13c6e09cecf19f48 Mon Sep 17 00:00:00 2001 From: jonathancasco <4037484+jonathancasco@users.noreply.github.com> Date: Thu, 16 Oct 2025 13:04:34 -0600 Subject: [PATCH 3/3] Update running.md --- docs/Documentation/Systems/Gila/running.md | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/docs/Documentation/Systems/Gila/running.md b/docs/Documentation/Systems/Gila/running.md index b4007cbf7..abeba1cf4 100644 --- a/docs/Documentation/Systems/Gila/running.md +++ b/docs/Documentation/Systems/Gila/running.md @@ -7,26 +7,24 @@ ### Compute hosts -Gila is a collection of physical nodes with each regular node containing . However, each node is virtualized. That is it is split up into virtual nodes with each virtual node having a portion of the cores and memory of the physical node. Similar virtual nodes are then assigned slurm partitions as shown below. +Compute nodes in Gila are virtualized nodes running on either __Dual AMD EPYC Milan CPUs__ or __Intel Xeon Icelake CPUs__. These nodes are not configured as exclusive and can be shared by multiple users or jobs. +### GPU hosts + +GPU nodes available in Gila have NVidia A100 GPU's running on __Intel Xeon Icelake CPUs__. -*Move this info to filesystems page?* ### Shared file systems Gila's home directories are shared across all nodes. Each user has a quota of 5 GB. There is also /scratch/$USER and /projects spaces seen across all nodes. -*Need to update* ### Partitions -Partitions are flexible and fluid on Gila. A list of partitions can be found by running the `sinfo` command. Here are the partitions as of 3/27/2025. +A list of partitions can be found by running the `sinfo` command. Here are the partitions as of 10/16/2025 -| Partition Name | Qty | RAM | Cores/node | /var/scratch
1K-blocks | AU Charge Factor | -| :--: | :--: | :--: | :--: | :--: | :--: | -| gpu
*1 x NVIDIA Tesla A100* | 16 | 114 GB | 30 | 6,240,805,336| 12 | -| lg | 39 | 229 GB | 60 | 1,031,070,000| 7 | -| std | 60 | 114 GB | 30 | 515,010,816| 3.5 | -| sm | 28 | 61 GB | 16 | 256,981,000| 0.875 | -| t | 15 | 16 GB | 4 | 61,665,000| 0.4375 | +| Partition Name | CPU | Qty | RAM | Cores/node | /var/scratch
1K-blocks | AU Charge Factor | +| :--: | :--: | :--: | :--: | :--: | :--: | :--: | +| gpu
*NVIDIA Tesla A100-40*
| Intel Xeon Icelake | 1 | 910 GB | 42 | 6,240,805,336| 12 | +| cpu-amd | AMD Epyc Milan | 36 | 220 GB | 120 | 1,031,070,000| 7 | ### Allocation Unit (AU) Charges @@ -42,7 +40,7 @@ The **Charge Factor** for each partition is listed in the table above. ### Operating Software -The Gila HPC cluster runs fairly current versions of OpenHPC and SLURM on top of OpenStack. +The Gila HPC cluster runs on Rocky Linux 9.5.