Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
c6add2a
adding start of document of Gizmo vs. Harmony
laderast Mar 21, 2025
5f622ce
updating with mermaid diagram
laderast Mar 21, 2025
220786f
fixing mermaid diagram
laderast Mar 21, 2025
b951e63
Merge branch 'main' into multiple-environments
laderast Mar 21, 2025
de9d0d9
Merge branch 'main' into multiple-environments
atombaby Nov 25, 2025
16dbe65
Update cluster name
atombaby Nov 25, 2025
cc6b1a8
Update cluster name
atombaby Nov 25, 2025
2f10e4f
Add data for new cluster
atombaby Nov 25, 2025
b0389ca
Correct errors
atombaby Nov 25, 2025
2986937
Fix partition assignment
atombaby Nov 25, 2025
846df74
Add cluster description as a parameter
atombaby Nov 26, 2025
219d8b6
Add cluster description data
atombaby Nov 26, 2025
90365d1
Update table formatting
atombaby Nov 26, 2025
3df2e84
Update string block element
atombaby Nov 26, 2025
85f4cf0
Merge branch 'main' into multiple-environments
atombaby Nov 26, 2025
3916511
Add and separate chorus and gizmo information
atombaby Dec 1, 2025
3893ada
Change headers for cluster details
atombaby Dec 1, 2025
3f6b947
Update table data to use cluster name attribute
atombaby Dec 1, 2025
aeee593
Add placeholders for other Chorus nodes
atombaby Dec 1, 2025
c5bc5e6
Fix GPU table column data and layout
atombaby Dec 1, 2025
f034c39
update location name for clusters
atombaby Dec 1, 2025
9f74acb
Move gizmo instructions to include
atombaby Dec 1, 2025
b642f3e
remove empty file
atombaby Dec 1, 2025
e5caae6
add include file for gizmo use
atombaby Dec 1, 2025
80d9c31
Add cluster-specific instructions and links
atombaby Dec 1, 2025
ca02968
Remove unnecssary include
atombaby Dec 1, 2025
bc6413f
Add collections for clusters to config
atombaby Dec 1, 2025
b72e587
Correct info link for clusters
atombaby Dec 1, 2025
9b8bf09
Add placeholder text
atombaby Dec 1, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 26 additions & 2 deletions _compdemos/Apptainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,13 @@ Apptainer is maintained and deployed in our environment using environment module

## Using Apptainer

> Apptainer is available on the rhino and gizmo compute hosts. Please use a gizmo node if your task will be computationally intensive. Apptainer containers can be run interactively (via grabnode) and in batch processing
{% capture widget_summary %}
Using Apptainer on Gizmo
{% endcapture %}

{% capture widget_details %}

Apptainer is available on the rhino and gizmo compute hosts. Please use a gizmo node if your task will be computationally intensive. Apptainer containers can be run interactively (via grabnode) and in batch processing

Apptainer is a module- load it with `ml`:

Expand All @@ -29,6 +35,25 @@ $ ml Apptainer

Use `ml spider` to see available versions. You can download ("pull") any Docker image and it will be converted to Apptainer format:

{% endcapture %}

{% include details-widget.html summary=widget_summary details=widget_details %}

{% capture widget_summary %}
Using Apptainer on Chorus
{% endcapture %}

{% capture widget_details %}
Apptainer is installed into the OS on Chorus nodes (including _maestro_). This restricts the version of Apptainer that is available.

No additional steps (e.g. using `module load`) are necessary.

{% endcapture %}

{% include details-widget.html summary=widget_summary details=widget_details %}

The most basic use of Apptainer is to run a Docker container image:

```ShellSession
$ apptainer pull docker://ghcr.io/apptainer/lolcow
INFO: Converting OCI blobs to SIF format
Expand Down Expand Up @@ -83,7 +108,6 @@ $ apptainer run lolcow_latest.sif




## Using Docker Containers with Apptainer

As indicated earlier, Apptainer can run Docker container images. However, Docker container images must first be converted to be usable by Apptainer.
Expand Down
6 changes: 6 additions & 0 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ collections:
scicomputing:
output: true
permalink: /:collection/:path/
scicomputing/chorus:
output: true
permalink: /:collection/:path/
scicomputing/gizmo:
output: true
permalink: /:collection/:path/
pathways:
output: true
permalink: /:collection/:path/
Expand Down
176 changes: 146 additions & 30 deletions _data/cluster_nodes.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
---
- cluster_name: gizmo
type: HPC
location: FHCRC
location: Day Campus, E2-222
description: >-
A general-purpose compute cluster based on Ubuntu Bionic. Has
some GPU capabilities
access:
-
type: terminal
Expand All @@ -14,7 +18,7 @@
old_nodes:
-
node_name: f
begin_production: 2014
begin_production: 2014
decommission: 2020-04-0
cores: 4
sockets: 1
Expand All @@ -28,7 +32,7 @@
local_storage: "HDD 800GB @ /loc (ext4)"
partition: campus
node_count: 456
-
-
node_name: g
cores: 6
sockets: 2
Expand All @@ -44,7 +48,7 @@
gpu: none
node_count: 18
partition: campus-new
-
-
node_name: h
cores: 14
sockets: 2
Expand All @@ -61,7 +65,7 @@
node_count: 3
partition: campus-new
nodes:
-
-
node_name: j
cores: 24
sockets: 2
Expand All @@ -76,11 +80,11 @@
network: 10G (up to 1GB/s throughput)
gpu: NVIDIA GTX 1080ti
gpu_count: 1
gpu_memory: 10.92 GB
gpu_memory: 10.92 GB
gpu_compute_capability: 6.1
node_count: 42
partition: campus, short, new
-
-
node_name: k
cores: 36
sockets: 2
Expand All @@ -95,30 +99,11 @@
network: 10G (up to 1GB/s throughput)
gpu: NVIDIA RTX 2080ti
gpu_count: 1
gpu_memory: 10.76 GB
gpu_memory: 10.76 GB
gpu_compute_capability: 7.5
node_count: 170
partition: campus, short, new
-
node_name: harmony
cores: 32
sockets: 1
memory_gb: 1536
processor_model: EPYC 9354P
processor_manufacturer: AMD
manufacturer: SuperMicro
vendor: Silicon Mechanics
model: AS-2015HS-TNR Hyper Server
os: Ubuntu 24.04 LTS
local_storage: "3TB @ /loc"
network: 10G (up to 1GB/s throughput)
gpu: NVIDIA L40S
gpu_count: 4
gpu_memory: 44 GB
gpu_compute_capability: 8.9
node_count: 8
partition: chorus
-
-
node_name: rhino
cores: 14
sockets: 2
Expand All @@ -133,7 +118,7 @@
network: 10G (up to 1GB/s throughput)
gpu: NVIDIA RTX1080ti
gpu_count: 1
gpu_memory: 10.92 GB
gpu_memory: 10.92 GB
gpu_compute_capability: 6.1
node_count: 3
partition: none (interactive use)
Expand All @@ -150,8 +135,139 @@
default_per_job: 60 minutes
max_per_job: 36 cpus, 768000MB memory (effective)
min_per_job: none
- cluster_name: chorus
type: HPC
location: Day Campus, E2-222
description: >-
A compute cluster with greater GPU capabilities, include the latest
generation of GPUs for machine learning algorithms
access:
-
type: terminal
url: ssh://maestro
auth: hutchnetID
auth_type: LDAP
scheduler:
-
name: slurm
version: 1
nodes:
-
node_name: medley
cores: 31
sockets: 1
memory_gb: 1536
processor_model: AMD
processor_manufacturer: AMD
manufacturer: SuperMicro
vendor: Silicon Mechanics
model: 6029U 2U Ultra NVMe Server
os: Ubuntu 24.04 LTS
local_storage: "3TB @ /loc"
network: 100G
gpu: none
gpu_count: 0
gpu_memory: 0
gpu_compute_capability: 0
node_count: 8
partition: medley
-
node_name: harmony
cores: 31
sockets: 1
memory_gb: 1536
processor_model: AMD
processor_manufacturer: AMD
manufacturer: SuperMicro
vendor: Silicon Mechanics
model: 6029U 2U Ultra NVMe Server
os: Ubuntu 24.04 LTS
local_storage: "3TB @ /loc"
network: 100G
gpu: NVIDIA L40S
gpu_count: 4
gpu_memory: 46 GB
gpu_compute_capability: 6.1
node_count: 8
partition: harmony
-
node_name: canto
cores: 36
sockets: 2
memory_gb: 1536
processor_model: Gold 6154
processor_manufacturer: Intel
manufacturer: SuperMicro
vendor: Silicon Mechanics
model: 6029U 2U Ultra NVMe Server
os: Ubuntu 18.04 LTS
local_storage: "6TB @ /loc"
network: 10G (up to 1GB/s throughput)
gpu: NVIDIA RTX 2080ti
gpu_count: 1
gpu_memory: 10.76 GB
gpu_compute_capability: 7.5
node_count: 4
partition: canto
-
node_name: forza
cores: 32
sockets: 1
memory_gb: 1536
processor_model: EPYC 9354P
processor_manufacturer: AMD
manufacturer: SuperMicro
vendor: Silicon Mechanics
model: AS-2015HS-TNR Hyper Server
os: Ubuntu 24.04 LTS
local_storage: "3TB @ /loc"
network: 10G (up to 1GB/s throughput)
gpu: NVIDIA H200
gpu_count: 8
gpu_memory: 44 GB
gpu_compute_capability: 8.9
node_count: 8
partition: forza
-
node_name: maestro
cores: 24
sockets: 1
memory_gb: 786
processor_model: EPYC 9224
processor_manufacturer: AMD
manufacturer: SuperMicro
vendor: Silicon Mechanics
model: Super Server
os: Ubuntu 24.04 LTS
local_storage: "3TB @ /loc"
network: 10G (up to 1GB/s throughput)
gpu: NVIDIA L40S
gpu_count: 1
gpu_memory: 46 GB
gpu_compute_capability: 6.1
node_count: 1
partition: none (interactive use)
partitions:
-
partition_name: canto
max_per_acct: 1 node
default_per_job: 1 day
max_per_job: 36 cpus, 768000MB memory (effective)
min_per_job: 1 CPU, 1 GPU
-
partition_name: forza
max_per_acct: 1 node
default_per_job: 1 day
max_per_job: 36 cpus, 768000MB memory (effective)
min_per_job: 1 CPU, 1 GPU
-
partition_name: medley
max_per_acct: 1 node
default_per_job: 1 day
max_per_job: 36 cpus, 768000MB memory (effective)
min_per_job: none
-
partition_name: chorus
partition_name: harmony
max_per_acct: 2 GPUs
default_per_job: 60 minutes
max_per_job: 8 CPUs, 2 GPUs, 384GB RAM
Expand Down
21 changes: 21 additions & 0 deletions _includes/gizmo/using.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
### Rhino

`Rhino`, or more specifically the `Rhinos`, are three locally managed HPC servers all accessed via the name `rhino`. Together, they function as a data and compute hub for a variety of data storage resources and high performance computing (HPC) such as those in the table above. The specific guidance for the use of each of the approaches to HPC access are slightly different, but will all require the user to learn how to access and interact with `rhino`.

More information on the topic of ssh configurations for access to `rhino` can be found [here](/scicomputing/access_methods/). More information on specific guidance for using `rhino` is in our [Resource Library](/compdemos/howtoRhino/).


### Gizmo Cluster

While we generally don't recommend interactive computing on the HPC clusters- interactive use can limit the amount of work you can do and introduce "fragility" into your computing- there are many scenarios where interactively using cluster nodes is a valid approach. For example, if you have a single task that is too much for a `rhino`, opening a session on a cluster node is the way to go.

If you need an interactive session with dedicated resources, you can start a job on the cluster using the command `grabnode`. The `grabnode` command will start an interactive login session on a cluster node. This command will prompt you for how many cores (probably 1 unless you know your task is multi-threaded), how much memory, and how much time you estimate will be required. This command can be run from any `rhino` host.

While most users will follow the interactive screen prompts to execute `grabnode`,
the command will also take some common `sbatch` options and flags.
Contact `scicomp` if you need options beyond those offered by `grabnode` prompts.

For non-interactive use of `gizmo`, see our pages on [Computing Environments and Software](/scicomputing/compute_environments/) and [Job Management](/scicomputing/compute_jobs/) and perhaps [Parallel Computing](/scicomputing/compute_parallel/).

Access to the Gizmo cluster requires both a HutchNet ID and an association to a PI account on the cluster. If you get errors like "Invalid account" when using `grabnode` or Slurm commands like `sbatch`, please contact `scicomp`.

Binary file added _scicomputing/assets/gizmo-harmony.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions _scicomputing/chorus/using.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Using the Chorus Cluster
primary_reviewers: atombaby
---

## Using the Chorus Cluster

> TBD: docs specific to using Chorus
Empty file.
6 changes: 3 additions & 3 deletions _scicomputing/compute_gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ There are currently two capabilities available for GPUs in the gizmo. The _J_ a

### GPU Nodes

|Location|Partition|Node Name|GPU|
|Cluster|Node Name|Partition(s)|GPU|
|---|:---:|:---:|---:|
{%- for resource in site.data.cluster_nodes %}
{%- for node in resource.nodes %}
{%- if node.gpu != 'none' %}
{{resource.location}}|{{ node.partition }}|{{ node.node_name }}|{{ node.gpu }}
{{resource.cluster_name}}|{{ node.node_name }}|{{ node.partition }}|{{ node.gpu }}
{%- endif %}
{%- endfor %}
{%- endfor %}
Expand Down Expand Up @@ -66,4 +66,4 @@ When submitting jobs make sure that your current environment does not have modul

## Using GPUs

When your job is assigned a GPU, Slurm sets the environment variable CUDA_VISIBLE_DEVICES. This environment variable indicates the assigned GPU- most CUDA tools (e.g. tensorflow) use this to restrict execution to that device.
When your job is assigned a GPU, Slurm sets the environment variable CUDA_VISIBLE_DEVICES. This environment variable indicates the assigned GPU- most CUDA tools (e.g. tensorflow) use this to restrict execution to that device.
Loading