Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 2 additions & 10 deletions _topic_maps/_topic_map.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1489,16 +1489,8 @@ Topics:
Dir: dpu-operator
Distros: openshift-enterprise,openshift-origin
Topics:
- Name: About the DPU and the DPU Operator
File: about-dpu
- Name: Installing the DPU Operator
File: installing-dpu-operator
- Name: Configuring the DPU Operator
File: configuring-dpu-operator
- Name: Running a workload on the DPU
File: running-workload-on-dpu
- Name: Uninstalling the DPU Operator
File: uninstalling-dpu-operator
- Name: DPU Operator
File: dpu-operator
- Name: Network security
Dir: network_security
Distros: openshift-enterprise,openshift-origin
Expand Down
9 changes: 6 additions & 3 deletions modules/nw-about-dpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,16 @@
[id="nw-about-dpu_{context}"]
= Orchestrating DPUs with the DPU Operator

A Data Processing Unit (DPU) is a type of programmable processor that is considered one of the three fundamental pillars of computing, alongside CPUs and GPUs. While CPUs handle general computing tasks and GPUs accelerate specific workloads, the primary role of the DPU is to offload and accelerate data-centric workloads, such as networking, storage, and security functions.
[role="_abstract"]
You can use the Data Processing Unit (DPU) Operator to manage DPUs that offload networking, storage, and security workloads from host CPUs to improve cluster performance and efficiency.

DPUs are typically used in data centers and cloud environments to improve performance, reduce latency, and enhance security by offloading these tasks from the CPU. DPUs can also be used to create a more efficient and flexible infrastructure by enabling the deployment of specialized workloads closer to the data source.
A DPU is a type of programmable processor that represents one of the three fundamental pillars of computing, alongside CPUs and GPUs. While CPUs handle general computing tasks and GPUs accelerate specific workloads, the primary role of the DPU is to offload and accelerate data-centric workloads, such as networking, storage, and security functions.

DPUs are typically used in data centers and cloud environments to improve performance, reduce latency, and enhance security by offloading these tasks from the CPU. You can also use DPUs to create a more efficient and flexible infrastructure by enabling the deployment of specialized workloads closer to the data source.

The DPU Operator is responsible for managing the DPU devices and network attachments. The DPU Operator deploys the DPU daemon onto {product-title} compute nodes that interface through an API controlling the DPU daemon running on the DPU. The DPU Operator is responsible for the life-cycle management of the `ovn-kube` components and the necessary host network initialization on the DPU.

The currently supported DPU devices are described in the following table.
The following table describes the currently supported DPU devices.

.Supported devices
[cols="1,1,1,2", options="header"]
Expand Down
37 changes: 28 additions & 9 deletions modules/nw-dpu-configuring-operator.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,37 +4,47 @@

:_mod-docs-content-type: PROCEDURE
[id="nw-dpu-configuring-operator_{context}"]
= Configuring the DPU Operator
= Configuring the DPU Operator

[role="_abstract"]
You can configure the DPU Operator after installation to enable management of DPU devices and network attachments in both dual cluster and single cluster deployment modes.

You can configure the DPU Operator to manage the DPU devices and network attachments in your cluster.

To configure the DPU Operator follow these steps:

.Procedure

. Create a `DpuOperatorConfig` custom resource (CR) on both the host cluster and on each of the DPU clusters. The DPU Operator in each cluster is activated after this CR is created.
. Create the `DpuOperatorConfig` Custom Resource (CR) based on your deployment mode:

* Dual Cluster Deployment: You must create the `DpuOperatorConfig` CR on both the host {product-title} cluster and on each of the {ms} DPU clusters.
* Single Cluster Deployment: This deployment uses a standard {product-title} cluster. You only need to create the `DpuOperatorConfig` CR once on this cluster.
+
The content of the CR is the same for all clusters.

. Create a file named `dpu-operator-host-config.yaml` by using the following YAML:
. Create a file named `dpu-operator-config.yaml` by using the following YAML:
+
[source,yaml]
----
apiVersion: config.openshift.io/v1
kind: DpuOperatorConfig
metadata:
name: dpu-operator-config <1>
name: dpu-operator-config
spec:
mode: host <2>
logLevel: 0
----
+
<1> The name of the custom resource must be `dpu-operator-config`.
<2> Set the value to `host` on the host cluster. On each DPU cluster, which runs a single MicroShift cluster per DPU, set the value to `dpu`.
* `metadata.name`: Specifies the name of the Custom Resource, which must be `dpu-operator-config`.
* `spec.logLevel`: Sets the desired logging verbosity in the operator container logs. The value `0` is the default setting.

. Create the resource by running the following command:
+
[source,terminal]
----
$ oc apply -f dpu-operator-host-config.yaml
$ oc apply -f dpu-operator-config.yaml
----

. You must label all nodes that either have an attached DPU or are functioning as a DPU. On the host cluster, this means labeling all compute nodes assuming each node has an attached DPU with `dpu=true`. On the DPU, where each MicroShift cluster consists of a single node, label that single node in each cluster with `dpu=true`. You can apply this label by running the following command:
. Label all nodes that either have an attached DPU or are functioning as a DPU. You can apply this label by running the following command:
+
[source,terminal]
----
Expand All @@ -44,3 +54,12 @@ $ oc label node <node_name> dpu=true
where:
+
`node_name`:: Refers to the name of your node, such as `worker-1`.
+
[NOTE]
====
There are two ways to deploy clusters that are compatible with DPUs:

* Dual cluster deployment: This consists of {product-title} running on the hosts and {ms} running on the DPU. In this mode, the {ms} instance also needs to deploy the DPU Operator, and you must set the label `dpu=true` on the node.
* Single cluster deployment: This consists of only {product-title} running on hosts, where the DPUs are integrated into the main cluster. DPUs just require the label `dpu=true` for both the host nodes with DPUs installed and the DPU nodes themselves. The DPU Operator automatically detects the role of the node whether it is running as a DPU or a host with an attached DPU.
====

59 changes: 45 additions & 14 deletions modules/nw-dpu-creating-a-sfc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,14 @@

:_mod-docs-content-type: PROCEDURE
[id="nw-dpu-creating-a-sfc_{context}"]
= Creating a service function chain on the DPU
= Running a workload on the DPU

Network service chaining, also known as service function chaining (SFC) is a capability that uses software-defined networking (SDN) capabilities to create a chain of connected network services, such as L4-7 services like firewalls, network address translation (NAT), and intrusion protection.
[role="_abstract"]
You can deploy network workloads directly on the DPU to improve performance, enhance security isolation, and reduce host CPU usage.

Follow this procedure on the DPU to create the network function `my-network-function` in the service function chain.
The DPU offloads network workloads, such as security functions or virtualized appliances, to improve performance, enhance security isolation, and free host CPU resources.

Follow this procedure to deploy a simple pod directly onto the DPU.

.Prerequisites

Expand All @@ -18,27 +21,55 @@ Follow this procedure on the DPU to create the network function `my-network-func

.Procedure

. Save the following YAML file example as `sfc.yaml`:
. Save the following YAML file example as `dpu-pod.yaml`. This is an example of a simple pod that will be scheduled directly onto a DPU node by the Kubernetes default scheduler.
+
[source,yaml]
----
apiVersion: config.openshift.io/v1
kind: ServiceFunctionChain
apiVersion: v1
kind: Pod
metadata:
name: sfc
name: "my-network-function"
namespace: openshift-dpu-operator
annotations:
k8s.v1.cni.cncf.io/networks: dpunfcni-conf, dpunfcni-conf
spec:
networkFunctions:
- name: my-network-function <1>
image: quay.io/example-org/my-network-function:latest <2>
nodeSelector:
dpu.config.openshift.io/dpuside: "dpu"
containers:
- name: "my-network-function"
image: "quay.io/example-org/my-network-function:latest"
resources:
requests:
openshift.io/dpu: "2"
limits:
openshift.io/dpu: "2"
securityContext:
privileged: true
capabilities:
drop:
- ALL
add:
- NET_RAW
- NET_ADMIN
----
+
<1> The name of the network function. This name is used to identify the network function in the service function chain.
<2> The URL to the container image that contains the network function. The image must be accessible from the DPU.
* `metadata.name.annotations.k8s.v1.cni.cncf.io/networks`: The value `dpunfcni-conf` specifies the name of the `NetworkAttachmentDefinition` resource. The DPU Operator creates this resource during installation to configure the DPU networking.
* `spec.nodeSelector`: The `nodeSelector` is the primary mechanism for scheduling this workload. The DPU Operator creates and maintains the label: `dpu.config.openshift.io/dpuside: "dpu"`. This label ensures the pod is scheduled directly onto the DPU's processing unit.
* `spec.containers.name`: The name of the container.
* `spec.containers.image`: The container image to pull and run.

. Create the chain by running the following command on the DPU nodes:
. Create the pod by running the following command:
+
[source,terminal]
----
$ oc apply -f sfc.yaml
$ oc apply -f dpu-pod.yaml
----

. Verify the pod status by running the following command:
+
[source,bash]
----
$ oc get pods -n openshift-dpu-operator
----
+
Ensure the pod's status is `Running`.
3 changes: 3 additions & 0 deletions modules/nw-dpu-installing-operator-cli.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
[id="nw-dpu-installing-operator-cli_{context}"]
= Installing the DPU Operator by using the CLI

[role="_abstract"]
You can install the DPU Operator by using the CLI. You can use the DPU Operator to simplify the installation process when setting up DPU device management on host clusters.

As a cluster administrator, you can install the DPU Operator by using the CLI.

[NOTE]
Expand Down
5 changes: 4 additions & 1 deletion modules/nw-dpu-installing-operator-ui.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
[id="nw-dpu-installing-operator-ui_{context}"]
= Installing the DPU Operator using the web console

[role="_abstract"]
You can install the DPU Operator by using the web console. You can use the DPU Operator to simplify the installation process when setting up DPU device management on host clusters.

As a cluster administrator, you can install the DPU Operator by using the web console.

.Prerequisites
Expand All @@ -28,7 +31,7 @@ As a cluster administrator, you can install the DPU Operator by using the web co

. Navigate to the *Ecosystem* -> *Installed Operators* page.

. Ensure that *DPU Operator* is listed in the *openshift-dpu-operator* project with a *Status* of *InstallSucceeded*.
. Ensure that the *openshift-dpu-operator* project lists *DPU Operator* with a *Status* of *InstallSucceeded*.
+
[NOTE]
====
Expand Down
17 changes: 17 additions & 0 deletions modules/nw-dpu-intro-installing-operator.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
// Module included in the following assemblies:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 [error] OpenShiftAsciiDoc.ModuleContainsContentType: Module is missing the '_mod-docs-content-type' variable.

//
// * networking/networking_operators/installing-dpu-operator.adoc

:_mod-docs-content-type: Concept
[id="overview-installing-dpu-operator_{context}"]
= Installing the DPU Operator

[role="_abstract"]
You can install the Data Processing Unit (DPU) Operator on both host and DPU clusters to manage device lifecycle and network attachments using the CLI or web console.

Cluster administrators can install the DPU Operator on the host cluster and all DPU clusters using the {product-title} CLI or the web console. The DPU Operator manages the lifecycle, DPU devices, and network attachments for all supported DPUs."

[NOTE]
====
You need to install the DPU Operator on the host cluster and each of the DPU clusters.
====
7 changes: 5 additions & 2 deletions modules/nw-dpu-operator-uninstall.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,10 @@
[id="nw-dpu-operator-uninstall_{context}"]
= Uninstalling the DPU Operator

As a cluster administrator, you can uninstall the DPU Operator.
[role="_abstract"]
You can uninstall the DPU Operator from your cluster when you no longer need DPU device management, ensuring all workloads are deleted first.

To uninstall the DPU Operator, you must first delete any running DPU workloads. Follow this procedure to uninstall the DPU Operator.

.Prerequisites

Expand Down Expand Up @@ -69,7 +72,7 @@ $ oc delete namespace openshift-dpu-operator

.Verification

. Verify that the DPU Operator is uninstalled by running the following command. An example of succesful command output is `No resources found in openshift-dpu-operator namespace`.
. Verify that the DPU Operator is uninstalled by running the following command. An example of successful command output is `No resources found in openshift-dpu-operator namespace`.
+
[source,terminal]
----
Expand Down
21 changes: 16 additions & 5 deletions modules/nw-dpu-running-workloads.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,14 @@

:_mod-docs-content-type: PROCEDURE
[id="nw-running-workloads-dpu_{context}"]
= Running a workload on the DPU
= Running a workload on the host with DPU

Follow these steps to deploy a workload on the DPU.
[role="_abstract"]
You can deploy workloads on the host with DPU to offload specialized infrastructure tasks and improve performance while freeing up host CPU resources.

Running workloads on a DPU enables offloading specialized infrastructure tasks such as networking, security, and storage to a dedicated processing unit. This improves performance, enforces a stronger security boundary between infrastructure and application workloads, and frees up host CPU resources.

Follow these steps to deploy a workload on the host with DPU. This is the standard deployment model where the application runs on the host's x86 CPU but utilizes the DPU for network acceleration and offload.

.Prerequisites

Expand All @@ -16,7 +21,7 @@ Follow these steps to deploy a workload on the DPU.

.Procedure

. Create a sample workload on the host side by using the following YAML, save the file as `workload-host.yaml`:
. Create a sample workload designed to run on the host-side worker node by using the following YAML. Save the file as `workload-host.yaml`:
+
[source,yaml]
----
Expand All @@ -29,7 +34,7 @@ metadata:
k8s.v1.cni.cncf.io/networks: default-sriov-net
spec:
nodeSelector:
kubernetes.io/hostname: worker-237 <1>
kubernetes.io/hostname: worker-237
containers:
- name: appcntr1
image: registry.access.redhat.com/ubi9/ubi:latest
Expand All @@ -48,7 +53,13 @@ spec:
openshift.io/dpu: '1'
----
+
<1> The name of the node where the workload is deployed.
`spec.nodeSelector`: The node selector schedules the pod on the node with the DPU resource. You can use any standard Kubernetes selector for this, such as `kubernetes.io/hostname`, to target a specific node as shown in the example YAML.
+
[NOTE]
====
For flexible scheduling, the DPU Operator creates the label dpu.config.openshift.io/dpuside: "dpu-host". This label enables the default scheduler to place the workload on any host with a DPU. The workload automatically joins that DPU secondary network.
When the label on the node is `dpu.config.openshift.io/dpuside: "dpu"`, this signifies that the node is the DPU itself. The DPU Operator creates and manages the `dpu.config.openshift.io/dpuside` label .
====

. Create the workload by running the following command:
+
Expand Down
76 changes: 76 additions & 0 deletions modules/nw-monitoring-dpu-status.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
// Module included in the following assemblies:
//
// * networking/networking_operators/nw-dpu-running-workloads.adoc

:_mod-docs-content-type: PROCEDURE
[id="nw-dpu-monitoring-status_{context}"]
= Monitoring the status of DPU

[role="_abstract"]
You can monitor the DPU infrastructure status to check the current state and health of your DPU devices across the cluster.

You can monitor the DPU status to see the current state of the DPU infrastructure.

The `oc get dpu` command shows the current state of the DPU infrastructure. Follow this procedure to monitor the status of various cards.

.Prerequisites

* The OpenShift CLI (`oc`) is installed.
* An account with `cluster-admin` privileges is available.
* The DPU Operator is installed.
.Procedure

. Run the following command to check the overall health of your nodes:
+
[source,terminal]
----
$ oc get nodes
----
+
The example output provides a list of all nodes in the cluster along with their status. Ensure that all nodes are in the `Ready` state before proceeding.
+
[source,terminal]
----
NAME STATUS ROLES AGE VERSION
ocpcluster-master-1 Ready master 10d v1.32.9
ocpcluster-master-2 Ready master 10d v1.32.9
ocpcluster-master-3 Ready master 10d v1.32.9
ocpcluster-dpu-ipu-219 Ready worker 42h v1.32.9
ocpcluster-dpu-marvell-41 Ready worker 3d23h v1.32.9
ocpcluster-dpu-ptl-243 Ready worker 3d23h v1.32.9
worker-host-ipu-219 Ready worker 3d19h v1.32.9
worker-host-marvell-41 Ready worker 4d v1.32.9
worker-host-ptl-243 Ready worker 3d23h v1.32.9
----
+
This output shows three master nodes, and three worker nodes identified by the worker-host prefix, for example, `worker-host-ipu-219`. Each worker node contains a DPU identified by the ocpcluster-dpu prefix, for example, `ocpcluster-dpu-ipu-219`.

. Run the following command to report on the status of the DPUs:
+
[source,terminal]
----
$ oc get dpu
----
+
The example output provides a list of detected DPUs.
+
[source,terminal]
----
NAME DPU PRODUCT DPU SIDE MODE NAME STATUS
030001163eec00ff-host Intel Netsec Accelerator false worker-host-ptl-243 True
d4-e5-c9-00-ec-3v-dpu Intel Netsec Accelerator true worker-dpu-ptl-243 True
intel-ipu-0000-06-00.0-host Intel IPU E2100 false worker-host-ipu-219 False
intel-ipu-dpu Intel IPU E2100 true worker-dpu-ipu-219 False
marvell-dpu-0000-87-00.0-host Marvell DPU false worker-host-marvell-41 True
marvell-dpu-ipu Marvell DPU true worker-dpu-marvell-41 True
----
* `DPU PRODUCT`:Displays the vendor or type of DPU, for example, Intel or Marvell.
* `DPU SIDE`:Indicates whether the DPU is operating on the host side (`false`) or the DPU side (`true`). Each physical DPU is represented twice.
* `MODE NAME`:The name of the node where the DPU is located. This is the host worker node for `false` entries and the DPU node for `true` entries.
* `STATUS`:Indicates whether the DPU is functioning correctly (`True`) or has issues (`False`).
+
[NOTE]
====
Run `oc get dpu -o yaml` to get more details about the status.
====
Loading