Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added all files #208

Merged
merged 5 commits into from
Jul 23, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions docs/get-started/allocation/allocation-change-request.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,28 @@ This will show more details about the change request as shown below:

![Allocation Change Request Details for OpenStack Project](images/coldfront-openstack-change-requested-details.png)

### How to Use GPU Resources in your OpenStack Project

!!! tip "Comparison Between CPU and GPU"
To learn more about the key differences between CPUs and GPUs, please [read this](../../openstack/create-and-connect-to-the-VM/flavors.md#comparison-between-cpu-and-gpu).

A GPU instance is launched in the [same way](../../openstack/create-and-connect-to-the-VM/launch-a-VM.md)
as any other compute instance, with a few considerations to keep in mind:

- When launching a GPU based instance, be sure to select one of the
[GPU Tier](../../openstack/create-and-connect-to-the-VM/flavors.md#3-gpu-tier)
based flavor.

- You need to have sufficient resource quota to launch the desired flavor. Always
ensure you know which GPU-based flavor you want to use, then submit an
[allocation change request](#request-change-resource-allocation-attributes-for-openstack-project)
to adjust your current allocation to fit the flavor's resource requirements.

- We recommend using [ubuntu-22.04-x86_64](../../openstack/create-and-connect-to-the-VM/images.md#nerc-images-list)
as the image for your GPU-based instance because we have tested the NVIDIA driver
with this image and obtained good results. That said, it is possible to run a
variety of other images as well.

## Request Change Resource Allocation Attributes for OpenShift Project

![Request Change Resource Allocation Attributes for OpenShift Project](images/coldfront-openshift-allocation-attributes.png)
Expand Down Expand Up @@ -82,4 +104,18 @@ This will show more details about the change request as shown below:

![Allocation Change Request Details for OpenShift Project](images/coldfront-openshift-change-requested-details.png)
joachimweyl marked this conversation as resolved.
Show resolved Hide resolved

### How to Use GPU Resources in your OpenShift Project

!!! tip "Comparison Between CPU and GPU"
To learn more about the key differences between CPUs and GPUs, please [read this](../../openstack/create-and-connect-to-the-VM/flavors.md#comparison-between-cpu-and-gpu).

For OpenShift pods, we can specify different types of GPUs. Since OpenShift is not
based on flavors, we can customize the resources as needed at the pod level while
still utilizing GPU resources.

You can read about how to specify a pod to use a GPU [here](../../openshift/applications/scaling-and-performance-guide.md#how-to-specify-pod-to-use-gpu).

Also, you will be able to select a different GPU device for your workload, as
explained [here](../../openshift/applications/scaling-and-performance-guide.md#how-to-select-a-different-gpu-device).

---
2 changes: 1 addition & 1 deletion docs/get-started/allocation/allocation-details.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Allocation details

Access to ColdFront's allocations details is based on [user roles](#user-roles).
Access to ColdFront's allocations details is based on [user roles](manage-users-to-a-project.md#user-roles).
PIs and managers see the same allocation details as users, and can also add
project users to the allocation, if they're not already on it, and remove users
from an allocation.
Expand Down
4 changes: 2 additions & 2 deletions docs/get-started/allocation/coldfront.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ is granted, the PI will receive an email confirming the request approval and
how to connect NERC's ColdFront.

PI or project managers can use NERC's ColdFront as a self-service web-portal that
can see an administrative view of it as [described here](#pi-and-manager-view) and
can do the following tasks:
can see an administrative view of it as [described here](coldfront.md#pi-and-manager-view)
and can do the following tasks:

- **Only PI** can add a new project and archive any existing project(s)

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions docs/get-started/allocation/requesting-an-allocation.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,16 @@ or *OpenShift Resource Allocation* by specifying either **NERC (OpenStack)** or
**NERC-OCP (OpenShift)** in the **Resource** dropdown option. **Note:** The
first option i.e. **NERC (OpenStack)**, is selected by default.

!!! info "Default GPU Resource Quota for Initial Allocation Requests"
By default, the GPU resource quota is set to 0 for the initial resource
allocation request for both OpenStack and OpenShift Resource Types. However,
you will be able to [change request](allocation-change-request.md) and adjust
the corresponding GPU quotas for both after they are approved for the first
time. For NERC's OpenStack, please follow [this guide](allocation-change-request.md#how-to-use-gpu-resources-in-your-openstack-project)
on how to utilize GPU resources in your OpenStack project. For NERC's OpenShift,
refer to [this reference](allocation-change-request.md#how-to-use-gpu-resources-in-your-openshift-project)
to learn about how to use GPU resources in pod level.

## Request A New OpenStack Resource Allocation for an OpenStack Project

![Request A New OpenStack Resource Allocation](images/coldfront-request-new-openstack-allocation.png)
Expand Down
6 changes: 3 additions & 3 deletions docs/get-started/create-a-user-portal-account.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,8 @@ as shown in the image below:
!!! info "Information"
Once your PI user request is reviewed and approved by the NERC's admin, you
will receive an email confirmation from NERC's support system, i.e.,
**[email protected]**. Then, you can access [NERC's ColdFront resource
allocation management portal](https://coldfront.mss.mghpcc.org/) using the
PI user role, as [described here](allocation/coldfront.md).
[[email protected]](mailto:[email protected]?subject=NERC%20MOU%20Question).
Then, you can access [NERC's ColdFront resource allocation management portal](https://coldfront.mss.mghpcc.org/)
using the PI user role, as [described here](allocation/coldfront.md#how-to-get-access-to-nercs-coldfront).

---
8 changes: 4 additions & 4 deletions docs/migration-moc-to-nerc/Step2.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,10 +97,10 @@ samples below your lists might look like this:

| MOC Volume Name | MOC Disk | MOC Attached To | Bootable | MOC UUID | NERC Volume Name |
| --------------- | -------- | --------------- | -------- | -------- | ---------------- |
| Fedora | 10GiB | Fedora_test | Yes | ea45c20b-434a-4c41-8bc6-f48256fc76a8 | |
| 9c73295d-fdfa-4544-b8b8-a876cc0a1e86 | 10GiB | Ubuntu_Test | Yes | 9c73295d-fdfa-4544-b8b8-a876cc0a1e86 | |
| Snapshot of Fed_Test | 10GiB | Fedora_test | No | ea45c20b-434a-4c41-8bc6-f48256fc76a8 | |
| total | 30GiB | | | |
| Fedora | 10GiB | Fedora_test | Yes | ea45c20b-434a-4c41-8bc6-f48256fc76a8 | |
| 9c73295d-fdfa-4544-b8b8-a876cc0a1e86 | 10GiB | Ubuntu_Test | Yes | 9c73295d-fdfa-4544-b8b8-a876cc0a1e86 | |
| Snapshot of Fed_Test | 10GiB | Fedora_test | No | ea45c20b-434a-4c41-8bc6-f48256fc76a8 | |
| total | 30GiB | | | | |

#### MOC Security Group Information Table

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ Wait until the requested resource allocation gets approved by the NERC's admin.
After approval, kindly review and verify that the quotas are accurately
reflected in your [resource allocation](https://coldfront.mss.mghpcc.org/allocation/)
and [OpenShift project](https://console.apps.shift.nerc.mghpcc.org). Please ensure
that the approved quota values are accurately displayed as [explained here](#review-your-projects-resource-quota-from-openshift-web-dashboard).
that the approved quota values are accurately displayed as [explained here](decommission-openshift-resources.md#review-your-projects-resource-quota-from-openshift-web-dashboard).

### Review your Project Usage

Expand Down
5 changes: 3 additions & 2 deletions docs/openstack/access-and-security/create-a-key-pair.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,15 +233,16 @@ PuTTY requires SSH keys to be in its own `ppk` format. To convert between
OpenSSH keys used by OpenStack and PuTTY's format, you need a utility called PuTTYgen.

If it was not installed when you originally installed PuTTY, you can get it
here: [Download PuTTY](#http://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html).
here: [Download PuTTY](http://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html).

You have 2 options for generating keys that will work with PuTTY:

1. Generate an OpenSSH key with ssh-keygen or from the Horizon GUI using the
instructions above, then use PuTTYgen to convert the private key to .ppk

2. Generate a .ppk key with PuTTYgen, and import the provided OpenSSH public
key to OpenStack using the 'Import a Key Pair' instructions [above](#import-a-key-pair).
key to OpenStack using the 'Import the generated Key Pair' instructions
[above](create-a-key-pair.md#import-the-generated-key-pair).

There is a detailed walkthrough of how to use PuTTYgen here: [Use SSH Keys with
PuTTY on Windows](https://devops.profitbricks.com/tutorials/use-ssh-keys-with-putty-on-windows/).
Expand Down
30 changes: 24 additions & 6 deletions docs/openstack/create-and-connect-to-the-VM/flavors.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,24 @@ The important fields are
| Ephemeral | Size of a second disk. 0 means no second disk is defined and mounted. |
| VCPUs | Number of virtual cores |

## Comparison Between CPU and GPU

Here are the key differences between CPUs and GPUs:

| CPUs | GPUs |
| --------------------------------------------- | ---------------------------- |
| Work mostly in sequence. While several cores and excellent task switching give the impression of parallelism, a CPU is fundamentally designed to run one task at a time. | Are designed to work in parallel. A vast number of cores and threading managed in hardware enable GPUs to perform many simple calculations simultaneously. |
| Are designed for task parallelism. | Are designed for data parallelism. |
| Have a small number of cores that can complete single complex tasks at very high speeds. | Have a large number of cores that work in tandem to compute many simple tasks. |
| Have access to a large amount of relatively slow RAM with low latency, optimizing them for latency (operation). | Have access to a relatively small amount of very fast RAM with higher latency, optimizing them for throughput. |
| Have a very versatile instruction set, allowing the execution of complex tasks in fewer cycles but creating overhead in others. | Have a limited (but highly optimized) instruction set, allowing them to execute their designed tasks very efficiently. |
| Task switching (as a result of running the OS) creates overhead. | Task switching is not used; instead, numerous serial data streams are processed in parallel from point A to point B. |
| Will always work for any given use case but may not provide adequate performance for some tasks. | Would only be a valid choice for some use cases but would provide excellent performance in those cases. |

In summary, for applications such as Machine Learning (ML), Artificial
Intelligence (AI), or image processing, a GPU can provide a performance increase
of 50x to 200x compared to a typical CPU performing the same tasks.

## Currently, our setup supports and offers the following flavors

NERC offers the following flavors based on our Infrastructure-as-a-Service
Expand All @@ -32,7 +50,7 @@ The standard compute flavor **"cpu-su"** is provided from Lenovo SD530 (2x Intel
8268 2.9 GHz, 48 cores, 384 GB memory) server. The base unit is 1 vCPU, 4 GB
memory with default of 20 GB root disk at a rate of $0.013 / hr of wall time.

| Flavor | SUs | GPU | vCPU | RAM(GB) | Storage(GB) | Cost / hr |
| Flavor | SUs | GPU | vCPU | RAM(GiB) | Storage(GiB) | Cost / hr |
|---------------|-----|-----|-------|---------|-------------|-----------|
|cpu-su.1 |1 |0 |1 |4 |20 |$0.013 |
|cpu-su.2 |2 |0 |2 |8 |20 |$0.026 |
Expand All @@ -46,7 +64,7 @@ The memory optimized flavor **"mem-su"** is provided from the same servers at
**"cpu-su"** but with 8 GB of memory per core. The base unit is 1 vCPU, 8 GB
memory with default of 20 GB root disk at a rate of $0.026 / hr of wall time.

| Flavor | SUs | GPU | vCPU | RAM(GB) | Storage(GB) | Cost / hr |
| Flavor | SUs | GPU | vCPU | RAM(GiB) | Storage(GiB) | Cost / hr |
|---------------|-----|-----|-------|---------|-------------|-----------|
|mem-su.1 |1 |0 |1 |8 |20 |$0.026 |
|mem-su.2 |2 |0 |2 |16 |20 |$0.052 |
Expand Down Expand Up @@ -99,7 +117,7 @@ The higher number of tensor cores available can significantly enhance the speed
of machine learning applications. The base unit is 32 vCPU, 240 GB memory with
default of 20 GB root disk at a rate of $2.078 / hr of wall time.

| Flavor | SUs | GPU | vCPU | RAM(GB) | Storage(GB) | Cost / hr |
| Flavor | SUs | GPU | vCPU | RAM(GiB) | Storage(GiB) | Cost / hr |
|-------------------|-----|-----|-------|---------|-------------|-----------|
|gpu-su-a100sxm4.1 |1 |1 |32 |240 |20 |$2.078 |
|gpu-su-a100sxm4.2 |2 |2 |64 |480 |20 |$4.156 |
Expand Down Expand Up @@ -131,7 +149,7 @@ industry-leading high throughput and low latency networking. The base unit is 24
vCPU, 74 GB memory with default of 20 GB root disk at a rate of $1.803 / hr of
wall time.

| Flavor | SUs | GPU | vCPU | RAM(GB) | Storage(GB) | Cost / hr |
| Flavor | SUs | GPU | vCPU | RAM(GiB) | Storage(GiB) | Cost / hr |
|---------------|-----|-----|-------|---------|-------------|-----------|
|gpu-su-a100.1 |1 |1 |24 |74 |20 |$1.803 |
|gpu-su-a100.2 |2 |2 |48 |148 |20 |$3.606 |
Expand Down Expand Up @@ -161,7 +179,7 @@ The **"gpu-su-v100"** flavor is provided from Dell R740xd (2x Intel Xeon Gold 61
40 cores, 768GB memory, 1x NVIDIA V100 32GB) servers. The base unit is 48 vCPU,
192 GB memory with default of 20 GB root disk at a rate of $1.214 / hr of wall time.

| Flavor | SUs | GPU | vCPU | RAM(GB) | Storage(GB) | Cost / hr |
| Flavor | SUs | GPU | vCPU | RAM(GiB) | Storage(GiB) | Cost / hr |
|---------------|-----|-----|-------|---------|-------------|-----------|
|gpu-su-v100.1 |1 |1 |48 |192 |20 |$1.214 |

Expand Down Expand Up @@ -191,7 +209,7 @@ E5-2620 2.40GHz, 24 cores, 128GB memory, 4x NVIDIA K80 12GB) servers. The base u
is 6 vCPU, 28.5 GB memory with default of 20 GB root disk at a rate of $0.463 /
hr of wall time.

| Flavor | SUs | GPU | vCPU | RAM(GB) | Storage(GB) | Cost / hr |
| Flavor | SUs | GPU | vCPU | RAM(GiB) | Storage(GiB) | Cost / hr |
|--------------|-----|-----|-------|---------|-------------|-----------|
|gpu-su-k80.1 |1 |1 |6 |28.5 |20 |$0.463 |
|gpu-su-k80.2 |2 |2 |12 |57 |20 |$0.926 |
Expand Down
1 change: 1 addition & 0 deletions docs/openstack/create-and-connect-to-the-VM/images.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ an instance:
| Name |
|---------------------------------------|
| centos-7-x86_64 |
| centos-8-x86_64 |
| debian-10-x86_64 |
| fedora-36-x86_64 |
| rocky-8-x86_64 |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,7 @@ Press **Yes** if you receive the identity verification popup:
![RDP Windows Popup](images/rdp_popup_for_xrdp.png)

Then, enter your VM's username (ubuntu) and the password you created
for user ubuntu following [this steps](#setting-a-password.md).
for user ubuntu following [this steps](ssh-to-the-VM.md#setting-a-password.md).

Press **Ok**.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ Wait until the requested resource allocation gets approved by the NERC's admin.
After approval, kindly review and verify that the quotas are accurately
reflected in your [resource allocation](https://coldfront.mss.mghpcc.org/allocation/)
and [OpenStack project](https://stack.nerc.mghpcc.org/). Please ensure that the
approved quota values are accurately displayed as [explained here](#review-your-openstack-dashboard).
approved quota values are accurately displayed as [explained here](decommission-openstack-resources.md#review-your-openstack-dashboard).

### Review your Block Storage(Volume/Cinder) Quota

Expand Down
2 changes: 1 addition & 1 deletion docs/openstack/persistent-storage/detach-a-volume.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ the volume created before and attached to the VM and can be shown in
Check that the volume is in state 'available' again.

If that's the case, the volume is now ready to either be attached to another
virtual machine or, if it is not needed any longer, to be [completely deleted](#delete-volumes)
virtual machine or, if it is not needed any longer, to be [completely deleted](./delete-volumes.md)
(please note that this step cannot be reverted!).

## Attach the detached volume to an instance
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1266,7 +1266,8 @@ Here,
You can run either `juicefs config redis://default:<your_redis_password>@127.0.0.1:6379/1`
or `juicefs status redis://default:<your_redis_password>@127.0.0.1:6379/1` to get
detailed information about mounted file system i.e. **"myjfs"** that is setup by
following [this step](##formatting-file-system). The output looks like shown here:
following [this step](mount-the-object-storage.md#formatting-file-system). The
output looks like shown here:

{
...
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -875,9 +875,9 @@ following commands:
apiVersion: v1
kind: Secret
metadata:
name: skooner-sa-token
annotations:
kubernetes.io/service-account.name: skooner-sa
name: skooner-sa-token
annotations:
kubernetes.io/service-account.name: skooner-sa
type: kubernetes.io/service-account-token
EOF
```
Expand All @@ -889,7 +889,8 @@ following commands:
obtained from the *TokenRequest API* are more secure than ones stored in Secret
objects, because they have a bounded lifetime and are not readable by other API
clients. You can use the `kubectl create token` command to obtain a token from
the TokenRequest API. For example: `kubectl create token skooner-sa`.
the TokenRequest API. For example: `kubectl create token skooner-sa`, where
`skooner-sa` is service account name.

- Find the secret that was created to hold the token for the SA

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -670,9 +670,9 @@ following commands:
apiVersion: v1
kind: Secret
metadata:
name: skooner-sa-token
annotations:
kubernetes.io/service-account.name: skooner-sa
name: skooner-sa-token
annotations:
kubernetes.io/service-account.name: skooner-sa
type: kubernetes.io/service-account-token
EOF
```
Expand All @@ -685,7 +685,8 @@ following commands:
Secret objects, because they have a bounded lifetime and are not readable
by other API clients. You can use the `kubectl create token` command to
obtain a token from the TokenRequest API. For example:
`kubectl create token skooner-sa`.
`kubectl create token skooner-sa`, where `skooner-sa` is service account
name.

- Find the secret that was created to hold the token for the SA

Expand Down
Loading