Skip to content

Commit

Permalink
Fix documentation
Browse files Browse the repository at this point in the history
Fix linux networking documentation and update few guides according
to IPU SDK procedure

Signed-off-by: nupuruttarwar <[email protected]>
  • Loading branch information
nupuruttarwar committed Dec 15, 2023
1 parent 7d5383b commit 6a55f61
Show file tree
Hide file tree
Showing 4 changed files with 87 additions and 115 deletions.
61 changes: 29 additions & 32 deletions docs/apps/lnw/es2k/es2k-linux-networking.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Linux Networking for ES2K

Linux Networking provides support for offloading various networking functions, such as L2 forwarding, L3 forwarding, ECMP, and VxLAN encapsulation and decapsulation intelligence to the IPU. This capability empowers overlay services to establish communication with endpoints through VxLAN tunnels, thereby extending the L2 segment across the underlay network. To achieve Linux networking support, we have used legacy OvS for overlay source MAC learning and VxLAN configurations, while relying on the kernel for underlay neighbor discovery, route management, and next-hop information.
Linux Networking provides support for offloading various networking functions, such as L2 forwarding, L3 forwarding, ECMP, and VxLAN encapsulation and decapsulation intelligence to the IPU. This capability empowers overlay services to establish communication with endpoints through VxLAN tunnels, thereby extending the L2 segment across the underlay network. To achieve Linux networking support, we have used enhanced legacy OvS for overlay source MAC learning and VxLAN configurations, while relying on the kernel for underlay neighbor discovery, route management, and next-hop information.

## Feature Overview

Expand All @@ -14,7 +14,7 @@ To enable this feature we have,
- `Infrap4d`: This process includes a p4runtime server. Calls TDI front end to program IPU E2100.
- `ovs-vswitchd`: This process is integrated with p4runtime intelligence and acts as a gRPC client. Programs IPU E2100 with control plane configuration and forwarding tables by communicating with gRPC server.
- `p4rt-ctl`: This python CLI includes a p4runtime client. Programs IPU E2100 with runtime rules by communicating with gRPC server.
- `Kernel stack`: All underlay related configurations are picked by `kernel monitor` thread via netlink events in `infrap4d` and these are programmed in IPU E2100 by calling TDI front end calls.
- `Kernel stack`: All underlay related configurations are picked by `kernel monitor` thread via netlink events in `infrap4d` and these are programmed in IPU E2100 by calling TDI front end API's.

## Topology

Expand All @@ -28,7 +28,7 @@ This topology breakdown and configuration assumes all VMs are spawned on HOST VF
- Every physical port will have a corresponding port representer in ACC.
- Every physical port will have an uplink (APF netdev) in HOST and this uplink will have a corresponding port representer in ACC.
- All port representers are associated with an OvS bridge.
- For VxLAN egress traffic, the underlay port should be associated with a termination bridge and IP to reach the underlay network should be configured on top of this bridge.
- For VxLAN egress traffic, the underlay port should be associated with a termination bridge and the IP to reach the underlay network should be configured on this bridge.

## Detailed Design

Expand All @@ -39,8 +39,7 @@ To enable slow path mode:
- Start the infrap4d process with the Kernel Monitor disabled. Command: `infrap4d -disable-krnlmon`
- Set environment variable `OVS_P4_OFFLOAD=false` before starting the `ovs-vswitchd` process.

In this mode, we need to associate VFs on top of which VMs are created and its port representers, also physical ports with its port representers.
Configure tables:
In this mode, we need to associate VFs with VMs and its port representers along with physical ports and its port representers. Configure following tables to map these in IPU:

```text
- rx_source_port
Expand All @@ -51,15 +50,13 @@ Configure tables:
- rx_phy_port_to_pr_map
```

All port representers (PRs) in ACC should be associated with an OvS bridge. Mapping between PRs and bridges need to be programmed in IPU as well.
Configure table:
All port representers (PRs) in ACC should be associated with an OvS bridge. Configure table below to program the mapping between PRs and bridges in IPU:

```text
- source_port_to_bridge_map
```

For egress VxLAN traffic, an OvS VxLAN port needs to be created in ACC and associated to the integration bridge that handles overlay traffic.
Configure table:
For egress VxLAN traffic, an OvS VxLAN port needs to be created in ACC with associated integration bridge that handles overlay traffic. Configure following tables to map these in IPU:

```text
- rx_ipv4_tunnel_source_port/rx_ipv6_tunnel_source_port
Expand Down Expand Up @@ -92,31 +89,31 @@ Packets coming from overlay network:

- Determine the source port of the packet based on which overlay VSI the packet has landed on.
- Validate if the source port is part of the bridge, else drop the packet.
- If valid bridge configuration is found, find the PR associated with the bridge and forward the packet to the PR in ACC.
- If valid bridge configuration is found, find the PR associated with the bridge and forward it to the PR in ACC.
- OvS control plane receives the packet and forwards the packets to the destined VxLAN port if MAC is already learnt, else flood the packet in the respective bridge.
- Once the packet reaches the VxLAN port, here the kernel checks the routing table to reach `remote_ip` that is configured for the OvS VxLAN tunnel.
- Underlay network to reach `remote_ip` is configured on a TEP termination bridge. Here, the kernel resolves ARP of the underlay network.
- Once ARP is resolved, kernel encapsulates the packet and this packet will be forwarded to the destined PR of the physical port if MAC is already learnt, else flood the packet in the respective TEP termination bridge.
- Underlay network to reach `remote_ip` is configured on a TEP termination bridge and kernel resolves the ARP for underlay network.
- Once ARP is resolved, kernel encapsulates the packet and it is forwarded to the destined PR of the physical port if MAC is already learnt, else flooded in the respective TEP termination bridge.
- Sample OvS config:

```bash
ovs-vsctl add-br br-int
ovs-vsctl add-port br-int <Overlay VMs PR>
ovs-vsctl add-port br-int <VxLAN port with VxLAN config>
ovs-vsctl add-br br-tep-termination ## this bridge has IP to reach remote TEP
ovs-vsctl add-br br-tep-termination ## this bridge should be configured with IP to reach remote TEP
ovs-vsctl add-port br-tep-termination <Physical port PR>
```

#### For Rx

##### Ingress non VxLAN packet
##### Ingress traffic without VxLAN encap

If the packets coming from a remote machine to the physical port are not VxLAN tunnel packets:
If the packets coming from a remote machine to the physical port are not VxLAN encapped packets:

- Determine the source port of the packet based on which physical port the packet has landed on.
- Validate if the source port is part of the bridge, else drop the packet.
- If valid bridge configuration is found, find the PR associated with the bridge and forward the packet to the PR in ACC.
- OvS control plane receives the packet and forwards the packets to destined PR if MAC is already learnt, else flood the packet in the respective bridge.
- If valid bridge configuration is found, find the PR associated with the bridge and forward it to the PR in ACC.
- OvS control plane receives the packet and it is forwarded to destined PR if MAC is already learnt, else flooded in the respective bridge.
- Sample OvS config:

```bash
Expand All @@ -125,13 +122,13 @@ If the packets coming from a remote machine to the physical port are not VxLAN t
ovs-vsctl add-port br-int <Physical port PR>
```

##### Ingress VxLAN packet
##### Ingress traffic with VxLAN encap

If the packets coming from a remote machine to the physical port are not VxLAN tunnel packets:
If the packets coming from a remote machine to the physical port are VxLAN encapped packets:

- Determine the source port of the packet based on which physical port the packet has landed
- Determine the source port of the packet based on which physical port the packet has landed on.
- Validate if the source port is part of the bridge, else drop the packet.
- If valid bridge configuration is found, find the PR associated with the physical port and forward the packet to the PR in ACC.
- If valid bridge configuration is found, find the PR associated with the physical port and forward it to the PR in ACC.
- OvS control plane receives the packet on a TEP termination bridge, packet gets decapped and sent to VxLAN port.
- Since VxLAN port and overlay VMs PR are in the same bridge, if the overlay MAC is already learnt the packet will be forwarded to destined PR else packet will be flooded in the respective bridge.
- Sample OvS config:
Expand All @@ -148,10 +145,10 @@ If the packets coming from a remote machine to the physical port are not VxLAN t

To enable fast path mode:

- Start the infrap4d process.
- Start the infrap4d process. Command: `infrap4d`
- Remove the environment variable `OVS_P4_OFFLOAD=false` before starting the `ovs-vswitchd` process.

In this mode, we need to associate VFs on top which VMs are created and its port representers and also physical ports with its port representers.
In this mode, we need to associate VFs with the VMs and its port representers along with physical ports and its port representers.
Configure tables:

```text
Expand Down Expand Up @@ -243,9 +240,9 @@ Packets coming from overlay network:

#### For Rx

##### Ingress non VxLAN packet
##### Ingress traffic without VxLAN encap

If the packets coming from a remote machine to the physical port are not VxLAN tunnel packets:
If the packets coming from a remote machine to the physical port are not VxLAN encapped packets:

- Determine the source port of the packet based on which physical port the packet has landed on.
- Validate if the source port is part of the bridge, else drop the packet.
Expand All @@ -259,9 +256,9 @@ If the packets coming from a remote machine to the physical port are not VxLAN t
ovs-vsctl add-port br-int <Physical port PR>
```

##### Ingress VxLAN packet
##### Ingress traffic with VxLAN encap

If the packets coming from a remote machine to the physical port are not VxLAN tunnel packets:
If the packets coming from a remote machine to the physical port are VxLAN encapped packets:

- Determine the source port of the packet based on which physical port the packet has landed
- Validate if the source port is part of the bridge, else drop the packet.
Expand All @@ -280,7 +277,7 @@ If the packets coming from a remote machine to the physical port are not VxLAN t
## Summary

- Verification of source port and Associated L2 Bridge: The P4 Control Plane (P4 CP) must ensure the validation of the source port and its corresponding L2 bridge before initiating any further regulation of datapath packet classification.
- Exception Packet Handling for All Protocols: The P4 Control Plane (P4 CP) shall incorporate exception packet handling logic, not limited to ARP but applicable to the first packet of any protocol.
- Exception Packet Handling for all Protocols: The P4 Control Plane (P4 CP) shall incorporate exception packet handling logic, not limited to ARP but applicable to the first packet of any protocol.
- Offloading of Networking Functions: The P4 Control Plane (P4 CP) software shall provide support for the offloading of various networking functions as specified in the Linux Networking use case. These networking functions include Layer 2 (L2) and Layer 3 (L3) forwarding, Equal-Cost Multi-Path (ECMP) routing, Link Aggregation Group (LAG), as well as Virtual Extensible LAN (VXLAN) encapsulation and decapsulation. These functions shall support both single and multiple Open vSwitch (OvS) bridges.

## Limitations
Expand All @@ -292,14 +289,14 @@ Current Linux Networking support for the networking recipe has the following lim
- Only OvS bridges are supported.
- Configure p4rt-ctl runtime rules before OvS configuration.
- Double vlan tag is NOT supported.
- Add all ACC PR's to VSI group 1
- On ACC firewalld need to be disabled, this service is blocking tunnel packets.
- Add all ACC PR's to VSI group 1.
- On ACC, firewalld need to be disabled. Otherwise, this service will blocking tunneled packets.
- systemctl stop firewalld
- Refer LNW-V2 README_P4_CP_NWS which comes along with the p4 program for limitation with router_interface_id action in nexthop_table (Bug created for this)
- Refer LNW-V2 README_P4_CP_NWS which comes along with the P4 program for limitation with router_interface_id action in nexthop_table (Defect filed)
- Manually modify context.json to remove NOP hardware action for in context.json from "set_nexthop " action in "nexthop_table". Open defect is present in p4-sde to fix this issue.

```text
Content to be removed under hardware action is
Content to be removed under hardware action in context.json is
{
"prec": 0,
"action_code": "NOP",
Expand Down
29 changes: 8 additions & 21 deletions docs/apps/lnw/es2k/es2k-lnw-overlay-vms.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,3 @@
<!--
Copyright (c) 2023 Intel Corporation.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Linux Networking with Overlay VMs

Expand All @@ -29,10 +12,14 @@ for more details on this feature.

Prerequisites:

- Follow steps mentioned in [Deploying P4 Programs for E2100](/guides/es2k/deploying-p4-programs) for bringing up IPU with a particular release build.
- Download `hw-p4-programs` TAR file specific to the build and extract it to get `fxp-net_linux-networking-v2` p4 artifacts. Go through `Limitations` specified in `README` and bringup the setup accordingly.
- Modify `sem_num_pages` to 25 and `lem_num_pages` to 10 in `cp_init.cfg` present in IMC.
- For this use case, before booting ACC with a particular release build, modify `acc_apf` value to 16 under `num_default_vport` in file `cp_init.cfg` present in IMC.
- Follow steps mentioned in [Deploying P4 Programs for E2100](/guides/es2k/deploying-p4-programs) for bringing up IPU with a custom P4 package.
- Modify `load_custom_pkg.sh` with following parameters for linux_networking package.
```text
sed -i 's/sem_num_pages = 1;/sem_num_pages = 25;/g' $CP_INIT_CFG
sed -i 's/lem_num_pages = 1;/lem_num_pages = 10;/g' $CP_INIT_CFG
sed -i 's/acc_apf = 4;/acc_apf = 16;/g' $CP_INIT_CFG
```
- Download `IPU_Documentation` TAR file specific to the build and refer to `Getting Started Guide` on how to install compatible `IDPF driver` on host. Once an IDPF driver is installed, bring up SRIOV VF by modifying the `sriov_numvfs` file present under one of the IDPF network devices. Example as below

```bash
Expand Down Expand Up @@ -126,7 +113,7 @@ Note: Here VSI 9 has been used as one of the ACC port representers and added to

### Start OvS as a separate process

Legacy OvS is used as a control plane for source MAC learning of overlay VM's. OvS should be started as a separate process.
Enhanced legacy OvS is used as a control plane for source MAC learning of overlay VM's. This OvS binary is available as part of ACC build and should be started as a separate process.

```bash
export RUN_OVS=/opt/p4/p4-cp-nws
Expand Down
50 changes: 25 additions & 25 deletions docs/guides/es2k/deploying-p4-programs.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,47 +14,47 @@ to generate the P4 artifacts required for deployment.
Data Path Control Plane (DPCP) starts with a default P4 package. To load a
custom P4 package follow below steps:

### 2.1 Interrupt default startup routine
### 2.1 Copy the custom P4 package

Reboot IMC and type ``N`` when the following message is shown on IMC console::
Copy the custom P4 package (p4_custom.pkg) in `/work/scripts` directory to IMC.

```text
start ipumgmtd and auxiliary script [Y/N] \
(default start both automatically after 10 seconds)?
```

### 2.2 Copy the custom P4 package
### 2.2 Modify the script responsible for loading custom package

Copy the custom P4 package (.pkg) in `/etc/dpcp/package` directory and
overwrite the `default_pkg.pkg`.
Replace the `p4_custom.pkg` with custom package name in `load_custom_pkg.sh` script.

For example, replace `default_pkg.pkg` with `simple_l3_l4_pna.pkg`
Any modifications intended in node policy `cp_init.cfg` should be provided as part of
the same script.

```bash
root@mev-imc:/etc/dpcp/package# ls -lrt /etc/dpcp/package/
total 2364
-rw-r--r-- 1 root root 963032 Jan 1 04:56 simple_l3_l4_pna.pkg
-rw-r--r-- 1 root root 1450456 Jun 8 2023 e2100-default-1.0.3.0.pkg
drwxr-xr-x 2 root root 0 Jun 8 2023 runtime_files
drwxr-xr-x 3 root root 0 Jun 8 2023 ref_pkg
lrwxrwxrwx 1 root root 25 Jun 8 2023 default_pkg.pkg -> e2100-default-1.0.3.0.pkg
root@mev-imc:/etc/dpcp/package# cp simple_l3_l4_pna.pkg default_pkg.pkg
[root@ipu-imc /]# cd /work/scripts
[root@ipu-imc scripts]# cat load_custom_pkg.sh
#!/bin/sh
CP_INIT_CFG=/etc/dpcp/cfg/cp_init.cfg
echo "Checking for custom package..."
if [ -e p4_custom.pkg ]; then
echo "Custom package p4_custom.pkg found. Overriding default package"
cp p4_custom.pkg /etc/dpcp/package/
rm -rf /etc/dpcp/package/default_pkg.pkg
ln -s /etc/dpcp/package/ p4_custom.pkg /etc/dpcp/package/default_pkg.pkg
sed -i 's/sem_num_pages = 1;/sem_num_pages = 25;/g' $CP_INIT_CFG
else
echo "No custom package found. Continuing with default package"
fi
```

If Communication Channel support is required,
[enable the communication channel](enabling-comm-channel.md)
before proceeding to the next step.

### 2.3 Start the IMC

Run the IMC start-up script.
### 2.3 Reboot the IMC

```bash
root@mev-imc:~# /etc/init.d/run_default_init_app
root@mev-imc:~# reboot
```
Once the IMC reboots successfully, IPU is loaded with the custom P4 package.

By default, `cpf_host` parameter in `/etc/dpcp/cfg/cp_init.cfg` is set to 4 which
enables ACC. If the start-up script is executed successfully, ACC comes up with a
By default, `cpf_host` parameter in node_policy is set to 4 which
enables ACC. If the IMC reboots successfully, ACC comes up with a
statically assigned IP address `192.168.0.2` to the eth0 network interface.
You can access ACC from IMC over an SSH session using this IP address.

Expand Down
Loading

0 comments on commit 6a55f61

Please sign in to comment.