diff --git a/docs/get-started/allocation/allocation-change-request.md b/docs/get-started/allocation/allocation-change-request.md index fa3f9818..dbe17934 100644 --- a/docs/get-started/allocation/allocation-change-request.md +++ b/docs/get-started/allocation/allocation-change-request.md @@ -46,6 +46,28 @@ This will show more details about the change request as shown below: ![Allocation Change Request Details for OpenStack Project](images/coldfront-openstack-change-requested-details.png) +### How to Use GPU Resources in your OpenStack Project + +!!! tip "Comparison Between CPU and GPU" + To learn more about the key differences between CPUs and GPUs, please [read this](../../openstack/create-and-connect-to-the-VM/flavors.md#comparison-between-cpu-and-gpu). + +A GPU instance is launched in the [same way](../../openstack/create-and-connect-to-the-VM/launch-a-VM.md) +as any other compute instance, with a few considerations to keep in mind: + +- When launching a GPU based instance, be sure to select one of the +[GPU Tier](../../openstack/create-and-connect-to-the-VM/flavors.md#3-gpu-tier) +based flavor. + +- You need to have sufficient resource quota to launch the desired flavor. Always +ensure you know which GPU-based flavor you want to use, then submit an +[allocation change request](#request-change-resource-allocation-attributes-for-openstack-project) +to adjust your current allocation to fit the flavor's resource requirements. + +- We recommend using [ubuntu-22.04-x86_64](../../openstack/create-and-connect-to-the-VM/images.md#nerc-images-list) +as the image for your GPU-based instance because we have tested the NVIDIA driver +with this image and obtained good results. That said, it is possible to run a +variety of other images as well. + ## Request Change Resource Allocation Attributes for OpenShift Project ![Request Change Resource Allocation Attributes for OpenShift Project](images/coldfront-openshift-allocation-attributes.png) @@ -82,4 +104,18 @@ This will show more details about the change request as shown below: ![Allocation Change Request Details for OpenShift Project](images/coldfront-openshift-change-requested-details.png) +### How to Use GPU Resources in your OpenShift Project + +!!! tip "Comparison Between CPU and GPU" + To learn more about the key differences between CPUs and GPUs, please [read this](../../openstack/create-and-connect-to-the-VM/flavors.md#comparison-between-cpu-and-gpu). + +For OpenShift pods, we can specify different types of GPUs. Since OpenShift is not +based on flavors, we can customize the resources as needed at the pod level while +still utilizing GPU resources. + +You can read about how to specify a pod to use a GPU [here](../../openshift/applications/scaling-and-performance-guide.md#how-to-specify-pod-to-use-gpu). + +Also, you will be able to select a different GPU device for your workload, as +explained [here](../../openshift/applications/scaling-and-performance-guide.md#how-to-select-a-different-gpu-device). + --- diff --git a/docs/get-started/allocation/allocation-details.md b/docs/get-started/allocation/allocation-details.md index f45fc58c..da910197 100644 --- a/docs/get-started/allocation/allocation-details.md +++ b/docs/get-started/allocation/allocation-details.md @@ -1,6 +1,6 @@ # Allocation details -Access to ColdFront's allocations details is based on [user roles](#user-roles). +Access to ColdFront's allocations details is based on [user roles](manage-users-to-a-project.md#user-roles). PIs and managers see the same allocation details as users, and can also add project users to the allocation, if they're not already on it, and remove users from an allocation. diff --git a/docs/get-started/allocation/coldfront.md b/docs/get-started/allocation/coldfront.md index 176d7540..46187257 100644 --- a/docs/get-started/allocation/coldfront.md +++ b/docs/get-started/allocation/coldfront.md @@ -25,8 +25,8 @@ is granted, the PI will receive an email confirming the request approval and how to connect NERC's ColdFront. PI or project managers can use NERC's ColdFront as a self-service web-portal that -can see an administrative view of it as [described here](#pi-and-manager-view) and -can do the following tasks: +can see an administrative view of it as [described here](coldfront.md#pi-and-manager-view) +and can do the following tasks: - **Only PI** can add a new project and archive any existing project(s) diff --git a/docs/get-started/allocation/images/coldfront-openstack-change-requested-details.png b/docs/get-started/allocation/images/coldfront-openstack-change-requested-details.png index ffbea7e3..a5458748 100644 Binary files a/docs/get-started/allocation/images/coldfront-openstack-change-requested-details.png and b/docs/get-started/allocation/images/coldfront-openstack-change-requested-details.png differ diff --git a/docs/get-started/allocation/requesting-an-allocation.md b/docs/get-started/allocation/requesting-an-allocation.md index cb4b8b93..65d98e5d 100644 --- a/docs/get-started/allocation/requesting-an-allocation.md +++ b/docs/get-started/allocation/requesting-an-allocation.md @@ -10,6 +10,16 @@ or *OpenShift Resource Allocation* by specifying either **NERC (OpenStack)** or **NERC-OCP (OpenShift)** in the **Resource** dropdown option. **Note:** The first option i.e. **NERC (OpenStack)**, is selected by default. +!!! info "Default GPU Resource Quota for Initial Allocation Requests" + By default, the GPU resource quota is set to 0 for the initial resource + allocation request for both OpenStack and OpenShift Resource Types. However, + you will be able to [change request](allocation-change-request.md) and adjust + the corresponding GPU quotas for both after they are approved for the first + time. For NERC's OpenStack, please follow [this guide](allocation-change-request.md#how-to-use-gpu-resources-in-your-openstack-project) + on how to utilize GPU resources in your OpenStack project. For NERC's OpenShift, + refer to [this reference](allocation-change-request.md#how-to-use-gpu-resources-in-your-openshift-project) + to learn about how to use GPU resources in pod level. + ## Request A New OpenStack Resource Allocation for an OpenStack Project ![Request A New OpenStack Resource Allocation](images/coldfront-request-new-openstack-allocation.png) diff --git a/docs/get-started/create-a-user-portal-account.md b/docs/get-started/create-a-user-portal-account.md index 490eec1a..8c13a62d 100644 --- a/docs/get-started/create-a-user-portal-account.md +++ b/docs/get-started/create-a-user-portal-account.md @@ -133,8 +133,8 @@ as shown in the image below: !!! info "Information" Once your PI user request is reviewed and approved by the NERC's admin, you will receive an email confirmation from NERC's support system, i.e., - **help@nerc.mghpcc.org**. Then, you can access [NERC's ColdFront resource - allocation management portal](https://coldfront.mss.mghpcc.org/) using the - PI user role, as [described here](allocation/coldfront.md). + [help@nerc.mghpcc.org](mailto:help@nerc.mghpcc.org?subject=NERC%20MOU%20Question). + Then, you can access [NERC's ColdFront resource allocation management portal](https://coldfront.mss.mghpcc.org/) + using the PI user role, as [described here](allocation/coldfront.md#how-to-get-access-to-nercs-coldfront). --- diff --git a/docs/migration-moc-to-nerc/Step2.md b/docs/migration-moc-to-nerc/Step2.md index ac2c48dd..7720a8f5 100644 --- a/docs/migration-moc-to-nerc/Step2.md +++ b/docs/migration-moc-to-nerc/Step2.md @@ -97,10 +97,10 @@ samples below your lists might look like this: | MOC Volume Name | MOC Disk | MOC Attached To | Bootable | MOC UUID | NERC Volume Name | | --------------- | -------- | --------------- | -------- | -------- | ---------------- | -| Fedora | 10GiB | Fedora_test | Yes | ea45c20b-434a-4c41-8bc6-f48256fc76a8 | | -| 9c73295d-fdfa-4544-b8b8-a876cc0a1e86 | 10GiB | Ubuntu_Test | Yes | 9c73295d-fdfa-4544-b8b8-a876cc0a1e86 | | -| Snapshot of Fed_Test | 10GiB | Fedora_test | No | ea45c20b-434a-4c41-8bc6-f48256fc76a8 | | -| total | 30GiB | | | | +| Fedora | 10GiB | Fedora_test | Yes | ea45c20b-434a-4c41-8bc6-f48256fc76a8 | | +| 9c73295d-fdfa-4544-b8b8-a876cc0a1e86 | 10GiB | Ubuntu_Test | Yes | 9c73295d-fdfa-4544-b8b8-a876cc0a1e86 | | +| Snapshot of Fed_Test | 10GiB | Fedora_test | No | ea45c20b-434a-4c41-8bc6-f48256fc76a8 | | +| total | 30GiB | | | | | #### MOC Security Group Information Table diff --git a/docs/openshift/decommission/decommission-openshift-resources.md b/docs/openshift/decommission/decommission-openshift-resources.md index 52636c6a..108268b9 100644 --- a/docs/openshift/decommission/decommission-openshift-resources.md +++ b/docs/openshift/decommission/decommission-openshift-resources.md @@ -246,7 +246,7 @@ Wait until the requested resource allocation gets approved by the NERC's admin. After approval, kindly review and verify that the quotas are accurately reflected in your [resource allocation](https://coldfront.mss.mghpcc.org/allocation/) and [OpenShift project](https://console.apps.shift.nerc.mghpcc.org). Please ensure -that the approved quota values are accurately displayed as [explained here](#review-your-projects-resource-quota-from-openshift-web-dashboard). +that the approved quota values are accurately displayed as [explained here](decommission-openshift-resources.md#review-your-projects-resource-quota-from-openshift-web-dashboard). ### Review your Project Usage diff --git a/docs/openstack/access-and-security/create-a-key-pair.md b/docs/openstack/access-and-security/create-a-key-pair.md index 624a5d97..eec2f6e2 100644 --- a/docs/openstack/access-and-security/create-a-key-pair.md +++ b/docs/openstack/access-and-security/create-a-key-pair.md @@ -233,7 +233,7 @@ PuTTY requires SSH keys to be in its own `ppk` format. To convert between OpenSSH keys used by OpenStack and PuTTY's format, you need a utility called PuTTYgen. If it was not installed when you originally installed PuTTY, you can get it -here: [Download PuTTY](#http://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html). +here: [Download PuTTY](http://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html). You have 2 options for generating keys that will work with PuTTY: @@ -241,7 +241,8 @@ You have 2 options for generating keys that will work with PuTTY: instructions above, then use PuTTYgen to convert the private key to .ppk 2. Generate a .ppk key with PuTTYgen, and import the provided OpenSSH public - key to OpenStack using the 'Import a Key Pair' instructions [above](#import-a-key-pair). + key to OpenStack using the 'Import the generated Key Pair' instructions + [above](create-a-key-pair.md#import-the-generated-key-pair). There is a detailed walkthrough of how to use PuTTYgen here: [Use SSH Keys with PuTTY on Windows](https://devops.profitbricks.com/tutorials/use-ssh-keys-with-putty-on-windows/). diff --git a/docs/openstack/create-and-connect-to-the-VM/flavors.md b/docs/openstack/create-and-connect-to-the-VM/flavors.md index 94e93f7c..9f95f69f 100644 --- a/docs/openstack/create-and-connect-to-the-VM/flavors.md +++ b/docs/openstack/create-and-connect-to-the-VM/flavors.md @@ -17,6 +17,24 @@ The important fields are | Ephemeral | Size of a second disk. 0 means no second disk is defined and mounted. | | VCPUs | Number of virtual cores | +## Comparison Between CPU and GPU + +Here are the key differences between CPUs and GPUs: + +| CPUs | GPUs | +| --------------------------------------------- | ---------------------------- | +| Work mostly in sequence. While several cores and excellent task switching give the impression of parallelism, a CPU is fundamentally designed to run one task at a time. | Are designed to work in parallel. A vast number of cores and threading managed in hardware enable GPUs to perform many simple calculations simultaneously. | +| Are designed for task parallelism. | Are designed for data parallelism. | +| Have a small number of cores that can complete single complex tasks at very high speeds. | Have a large number of cores that work in tandem to compute many simple tasks. | +| Have access to a large amount of relatively slow RAM with low latency, optimizing them for latency (operation). | Have access to a relatively small amount of very fast RAM with higher latency, optimizing them for throughput. | +| Have a very versatile instruction set, allowing the execution of complex tasks in fewer cycles but creating overhead in others. | Have a limited (but highly optimized) instruction set, allowing them to execute their designed tasks very efficiently. | +| Task switching (as a result of running the OS) creates overhead. | Task switching is not used; instead, numerous serial data streams are processed in parallel from point A to point B. | +| Will always work for any given use case but may not provide adequate performance for some tasks. | Would only be a valid choice for some use cases but would provide excellent performance in those cases. | + +In summary, for applications such as Machine Learning (ML), Artificial +Intelligence (AI), or image processing, a GPU can provide a performance increase +of 50x to 200x compared to a typical CPU performing the same tasks. + ## Currently, our setup supports and offers the following flavors NERC offers the following flavors based on our Infrastructure-as-a-Service @@ -32,7 +50,7 @@ The standard compute flavor **"cpu-su"** is provided from Lenovo SD530 (2x Intel 8268 2.9 GHz, 48 cores, 384 GB memory) server. The base unit is 1 vCPU, 4 GB memory with default of 20 GB root disk at a rate of $0.013 / hr of wall time. -| Flavor | SUs | GPU | vCPU | RAM(GB) | Storage(GB) | Cost / hr | +| Flavor | SUs | GPU | vCPU | RAM(GiB) | Storage(GiB) | Cost / hr | |---------------|-----|-----|-------|---------|-------------|-----------| |cpu-su.1 |1 |0 |1 |4 |20 |$0.013 | |cpu-su.2 |2 |0 |2 |8 |20 |$0.026 | @@ -46,7 +64,7 @@ The memory optimized flavor **"mem-su"** is provided from the same servers at **"cpu-su"** but with 8 GB of memory per core. The base unit is 1 vCPU, 8 GB memory with default of 20 GB root disk at a rate of $0.026 / hr of wall time. -| Flavor | SUs | GPU | vCPU | RAM(GB) | Storage(GB) | Cost / hr | +| Flavor | SUs | GPU | vCPU | RAM(GiB) | Storage(GiB) | Cost / hr | |---------------|-----|-----|-------|---------|-------------|-----------| |mem-su.1 |1 |0 |1 |8 |20 |$0.026 | |mem-su.2 |2 |0 |2 |16 |20 |$0.052 | @@ -99,7 +117,7 @@ The higher number of tensor cores available can significantly enhance the speed of machine learning applications. The base unit is 32 vCPU, 240 GB memory with default of 20 GB root disk at a rate of $2.078 / hr of wall time. -| Flavor | SUs | GPU | vCPU | RAM(GB) | Storage(GB) | Cost / hr | +| Flavor | SUs | GPU | vCPU | RAM(GiB) | Storage(GiB) | Cost / hr | |-------------------|-----|-----|-------|---------|-------------|-----------| |gpu-su-a100sxm4.1 |1 |1 |32 |240 |20 |$2.078 | |gpu-su-a100sxm4.2 |2 |2 |64 |480 |20 |$4.156 | @@ -131,7 +149,7 @@ industry-leading high throughput and low latency networking. The base unit is 24 vCPU, 74 GB memory with default of 20 GB root disk at a rate of $1.803 / hr of wall time. -| Flavor | SUs | GPU | vCPU | RAM(GB) | Storage(GB) | Cost / hr | +| Flavor | SUs | GPU | vCPU | RAM(GiB) | Storage(GiB) | Cost / hr | |---------------|-----|-----|-------|---------|-------------|-----------| |gpu-su-a100.1 |1 |1 |24 |74 |20 |$1.803 | |gpu-su-a100.2 |2 |2 |48 |148 |20 |$3.606 | @@ -161,7 +179,7 @@ The **"gpu-su-v100"** flavor is provided from Dell R740xd (2x Intel Xeon Gold 61 40 cores, 768GB memory, 1x NVIDIA V100 32GB) servers. The base unit is 48 vCPU, 192 GB memory with default of 20 GB root disk at a rate of $1.214 / hr of wall time. -| Flavor | SUs | GPU | vCPU | RAM(GB) | Storage(GB) | Cost / hr | +| Flavor | SUs | GPU | vCPU | RAM(GiB) | Storage(GiB) | Cost / hr | |---------------|-----|-----|-------|---------|-------------|-----------| |gpu-su-v100.1 |1 |1 |48 |192 |20 |$1.214 | @@ -191,7 +209,7 @@ E5-2620 2.40GHz, 24 cores, 128GB memory, 4x NVIDIA K80 12GB) servers. The base u is 6 vCPU, 28.5 GB memory with default of 20 GB root disk at a rate of $0.463 / hr of wall time. -| Flavor | SUs | GPU | vCPU | RAM(GB) | Storage(GB) | Cost / hr | +| Flavor | SUs | GPU | vCPU | RAM(GiB) | Storage(GiB) | Cost / hr | |--------------|-----|-----|-------|---------|-------------|-----------| |gpu-su-k80.1 |1 |1 |6 |28.5 |20 |$0.463 | |gpu-su-k80.2 |2 |2 |12 |57 |20 |$0.926 | diff --git a/docs/openstack/create-and-connect-to-the-VM/images.md b/docs/openstack/create-and-connect-to-the-VM/images.md index f1f1088f..59da8072 100644 --- a/docs/openstack/create-and-connect-to-the-VM/images.md +++ b/docs/openstack/create-and-connect-to-the-VM/images.md @@ -22,6 +22,7 @@ an instance: | Name | |---------------------------------------| | centos-7-x86_64 | +| centos-8-x86_64 | | debian-10-x86_64 | | fedora-36-x86_64 | | rocky-8-x86_64 | diff --git a/docs/openstack/create-and-connect-to-the-VM/ssh-to-the-VM.md b/docs/openstack/create-and-connect-to-the-VM/ssh-to-the-VM.md index ed6366aa..990fe4f0 100644 --- a/docs/openstack/create-and-connect-to-the-VM/ssh-to-the-VM.md +++ b/docs/openstack/create-and-connect-to-the-VM/ssh-to-the-VM.md @@ -326,7 +326,7 @@ Press **Yes** if you receive the identity verification popup: ![RDP Windows Popup](images/rdp_popup_for_xrdp.png) Then, enter your VM's username (ubuntu) and the password you created -for user ubuntu following [this steps](#setting-a-password.md). +for user ubuntu following [this steps](ssh-to-the-VM.md#setting-a-password.md). Press **Ok**. diff --git a/docs/openstack/decommission/decommission-openstack-resources.md b/docs/openstack/decommission/decommission-openstack-resources.md index abbecb3d..a319f887 100644 --- a/docs/openstack/decommission/decommission-openstack-resources.md +++ b/docs/openstack/decommission/decommission-openstack-resources.md @@ -150,7 +150,7 @@ Wait until the requested resource allocation gets approved by the NERC's admin. After approval, kindly review and verify that the quotas are accurately reflected in your [resource allocation](https://coldfront.mss.mghpcc.org/allocation/) and [OpenStack project](https://stack.nerc.mghpcc.org/). Please ensure that the -approved quota values are accurately displayed as [explained here](#review-your-openstack-dashboard). +approved quota values are accurately displayed as [explained here](decommission-openstack-resources.md#review-your-openstack-dashboard). ### Review your Block Storage(Volume/Cinder) Quota diff --git a/docs/openstack/persistent-storage/detach-a-volume.md b/docs/openstack/persistent-storage/detach-a-volume.md index df642f7a..e0f04b83 100644 --- a/docs/openstack/persistent-storage/detach-a-volume.md +++ b/docs/openstack/persistent-storage/detach-a-volume.md @@ -59,7 +59,7 @@ the volume created before and attached to the VM and can be shown in Check that the volume is in state 'available' again. If that's the case, the volume is now ready to either be attached to another -virtual machine or, if it is not needed any longer, to be [completely deleted](#delete-volumes) +virtual machine or, if it is not needed any longer, to be [completely deleted](./delete-volumes.md) (please note that this step cannot be reverted!). ## Attach the detached volume to an instance diff --git a/docs/openstack/persistent-storage/mount-the-object-storage.md b/docs/openstack/persistent-storage/mount-the-object-storage.md index 8dbc7693..a92a8c38 100644 --- a/docs/openstack/persistent-storage/mount-the-object-storage.md +++ b/docs/openstack/persistent-storage/mount-the-object-storage.md @@ -1266,7 +1266,8 @@ Here, You can run either `juicefs config redis://default:@127.0.0.1:6379/1` or `juicefs status redis://default:@127.0.0.1:6379/1` to get detailed information about mounted file system i.e. **"myjfs"** that is setup by -following [this step](##formatting-file-system). The output looks like shown here: +following [this step](mount-the-object-storage.md#formatting-file-system). The +output looks like shown here: { ... diff --git a/docs/other-tools/kubernetes/kubeadm/HA-clusters-with-kubeadm.md b/docs/other-tools/kubernetes/kubeadm/HA-clusters-with-kubeadm.md index 143c08cf..f3c86cbb 100644 --- a/docs/other-tools/kubernetes/kubeadm/HA-clusters-with-kubeadm.md +++ b/docs/other-tools/kubernetes/kubeadm/HA-clusters-with-kubeadm.md @@ -875,9 +875,9 @@ following commands: apiVersion: v1 kind: Secret metadata: - name: skooner-sa-token - annotations: - kubernetes.io/service-account.name: skooner-sa + name: skooner-sa-token + annotations: + kubernetes.io/service-account.name: skooner-sa type: kubernetes.io/service-account-token EOF ``` @@ -889,7 +889,8 @@ following commands: obtained from the *TokenRequest API* are more secure than ones stored in Secret objects, because they have a bounded lifetime and are not readable by other API clients. You can use the `kubectl create token` command to obtain a token from - the TokenRequest API. For example: `kubectl create token skooner-sa`. + the TokenRequest API. For example: `kubectl create token skooner-sa`, where + `skooner-sa` is service account name. - Find the secret that was created to hold the token for the SA diff --git a/docs/other-tools/kubernetes/kubeadm/single-master-clusters-with-kubeadm.md b/docs/other-tools/kubernetes/kubeadm/single-master-clusters-with-kubeadm.md index 11ced9e8..d0c164dc 100644 --- a/docs/other-tools/kubernetes/kubeadm/single-master-clusters-with-kubeadm.md +++ b/docs/other-tools/kubernetes/kubeadm/single-master-clusters-with-kubeadm.md @@ -670,9 +670,9 @@ following commands: apiVersion: v1 kind: Secret metadata: - name: skooner-sa-token - annotations: - kubernetes.io/service-account.name: skooner-sa + name: skooner-sa-token + annotations: + kubernetes.io/service-account.name: skooner-sa type: kubernetes.io/service-account-token EOF ``` @@ -685,7 +685,8 @@ following commands: Secret objects, because they have a bounded lifetime and are not readable by other API clients. You can use the `kubectl create token` command to obtain a token from the TokenRequest API. For example: - `kubectl create token skooner-sa`. + `kubectl create token skooner-sa`, where `skooner-sa` is service account + name. - Find the secret that was created to hold the token for the SA