diff --git a/docs/apps/lnw/es2k/es2k-linux-networking.md b/docs/apps/lnw/es2k/es2k-linux-networking.md index 47e1f76e..41651101 100644 --- a/docs/apps/lnw/es2k/es2k-linux-networking.md +++ b/docs/apps/lnw/es2k/es2k-linux-networking.md @@ -1,6 +1,6 @@ # Linux Networking for ES2K -Linux Networking provides support for offloading various networking functions, such as L2 forwarding, L3 forwarding, ECMP, and VxLAN encapsulation and decapsulation intelligence to the IPU. This capability empowers overlay services to establish communication with endpoints through VxLAN tunnels, thereby extending the L2 segment across the underlay network. To achieve Linux networking support, we have used legacy OvS for overlay source MAC learning and VxLAN configurations, while relying on the kernel for underlay neighbor discovery, route management, and next-hop information. +Linux Networking provides support for offloading various networking functions, such as L2 forwarding, L3 forwarding, ECMP, and VxLAN encapsulation and decapsulation intelligence to the IPU. This capability empowers overlay services to establish communication with endpoints through VxLAN tunnels, thereby extending the L2 segment across the underlay network. To achieve Linux networking support, we have used enhanced legacy OvS for overlay source MAC learning and VxLAN configurations, while relying on the kernel for underlay neighbor discovery, route management, and next-hop information. ## Feature Overview @@ -14,7 +14,7 @@ To enable this feature we have, - `Infrap4d`: This process includes a p4runtime server. Calls TDI front end to program IPU E2100. - `ovs-vswitchd`: This process is integrated with p4runtime intelligence and acts as a gRPC client. Programs IPU E2100 with control plane configuration and forwarding tables by communicating with gRPC server. - `p4rt-ctl`: This python CLI includes a p4runtime client. Programs IPU E2100 with runtime rules by communicating with gRPC server. -- `Kernel stack`: All underlay related configurations are picked by `kernel monitor` thread via netlink events in `infrap4d` and these are programmed in IPU E2100 by calling TDI front end calls. +- `Kernel stack`: All underlay related configurations are picked by `kernel monitor` thread via netlink events in `infrap4d` and these are programmed in IPU E2100 by calling TDI front end API's. ## Topology @@ -28,7 +28,7 @@ This topology breakdown and configuration assumes all VMs are spawned on HOST VF - Every physical port will have a corresponding port representer in ACC. - Every physical port will have an uplink (APF netdev) in HOST and this uplink will have a corresponding port representer in ACC. - All port representers are associated with an OvS bridge. -- For VxLAN egress traffic, the underlay port should be associated with a termination bridge and IP to reach the underlay network should be configured on top of this bridge. +- For VxLAN egress traffic, the underlay port should be associated with a termination bridge and the IP to reach the underlay network should be configured on this bridge. ## Detailed Design @@ -39,8 +39,7 @@ To enable slow path mode: - Start the infrap4d process with the Kernel Monitor disabled. Command: `infrap4d -disable-krnlmon` - Set environment variable `OVS_P4_OFFLOAD=false` before starting the `ovs-vswitchd` process. -In this mode, we need to associate VFs on top of which VMs are created and its port representers, also physical ports with its port representers. -Configure tables: +In this mode, we need to associate VFs with VMs and its port representers along with physical ports and its port representers. Configure following tables to map these in IPU: ```text - rx_source_port @@ -51,15 +50,13 @@ Configure tables: - rx_phy_port_to_pr_map ``` -All port representers (PRs) in ACC should be associated with an OvS bridge. Mapping between PRs and bridges need to be programmed in IPU as well. -Configure table: +All port representers (PRs) in ACC should be associated with an OvS bridge. Configure table below to program the mapping between PRs and bridges in IPU: ```text - source_port_to_bridge_map ``` -For egress VxLAN traffic, an OvS VxLAN port needs to be created in ACC and associated to the integration bridge that handles overlay traffic. -Configure table: +For egress VxLAN traffic, an OvS VxLAN port needs to be created in ACC with associated integration bridge that handles overlay traffic. Configure following tables to map these in IPU: ```text - rx_ipv4_tunnel_source_port/rx_ipv6_tunnel_source_port @@ -92,31 +89,31 @@ Packets coming from overlay network: - Determine the source port of the packet based on which overlay VSI the packet has landed on. - Validate if the source port is part of the bridge, else drop the packet. -- If valid bridge configuration is found, find the PR associated with the bridge and forward the packet to the PR in ACC. +- If valid bridge configuration is found, find the PR associated with the bridge and forward it to the PR in ACC. - OvS control plane receives the packet and forwards the packets to the destined VxLAN port if MAC is already learnt, else flood the packet in the respective bridge. - Once the packet reaches the VxLAN port, here the kernel checks the routing table to reach `remote_ip` that is configured for the OvS VxLAN tunnel. -- Underlay network to reach `remote_ip` is configured on a TEP termination bridge. Here, the kernel resolves ARP of the underlay network. -- Once ARP is resolved, kernel encapsulates the packet and this packet will be forwarded to the destined PR of the physical port if MAC is already learnt, else flood the packet in the respective TEP termination bridge. +- Underlay network to reach `remote_ip` is configured on a TEP termination bridge and kernel resolves the ARP for underlay network. +- Once ARP is resolved, kernel encapsulates the packet and it is forwarded to the destined PR of the physical port if MAC is already learnt, else flooded in the respective TEP termination bridge. - Sample OvS config: ```bash ovs-vsctl add-br br-int ovs-vsctl add-port br-int ovs-vsctl add-port br-int - ovs-vsctl add-br br-tep-termination ## this bridge has IP to reach remote TEP + ovs-vsctl add-br br-tep-termination ## this bridge should be configured with IP to reach remote TEP ovs-vsctl add-port br-tep-termination ``` #### For Rx -##### Ingress non VxLAN packet +##### Ingress traffic without VxLAN encap -If the packets coming from a remote machine to the physical port are not VxLAN tunnel packets: +If the packets coming from a remote machine to the physical port are not VxLAN encapped packets: - Determine the source port of the packet based on which physical port the packet has landed on. - Validate if the source port is part of the bridge, else drop the packet. -- If valid bridge configuration is found, find the PR associated with the bridge and forward the packet to the PR in ACC. -- OvS control plane receives the packet and forwards the packets to destined PR if MAC is already learnt, else flood the packet in the respective bridge. +- If valid bridge configuration is found, find the PR associated with the bridge and forward it to the PR in ACC. +- OvS control plane receives the packet and it is forwarded to destined PR if MAC is already learnt, else flooded in the respective bridge. - Sample OvS config: ```bash @@ -125,13 +122,13 @@ If the packets coming from a remote machine to the physical port are not VxLAN t ovs-vsctl add-port br-int ``` -##### Ingress VxLAN packet +##### Ingress traffic with VxLAN encap -If the packets coming from a remote machine to the physical port are not VxLAN tunnel packets: +If the packets coming from a remote machine to the physical port are VxLAN encapped packets: -- Determine the source port of the packet based on which physical port the packet has landed +- Determine the source port of the packet based on which physical port the packet has landed on. - Validate if the source port is part of the bridge, else drop the packet. -- If valid bridge configuration is found, find the PR associated with the physical port and forward the packet to the PR in ACC. +- If valid bridge configuration is found, find the PR associated with the physical port and forward it to the PR in ACC. - OvS control plane receives the packet on a TEP termination bridge, packet gets decapped and sent to VxLAN port. - Since VxLAN port and overlay VMs PR are in the same bridge, if the overlay MAC is already learnt the packet will be forwarded to destined PR else packet will be flooded in the respective bridge. - Sample OvS config: @@ -148,10 +145,10 @@ If the packets coming from a remote machine to the physical port are not VxLAN t To enable fast path mode: -- Start the infrap4d process. +- Start the infrap4d process. Command: `infrap4d` - Remove the environment variable `OVS_P4_OFFLOAD=false` before starting the `ovs-vswitchd` process. -In this mode, we need to associate VFs on top which VMs are created and its port representers and also physical ports with its port representers. +In this mode, we need to associate VFs with the VMs and its port representers along with physical ports and its port representers. Configure tables: ```text @@ -243,9 +240,9 @@ Packets coming from overlay network: #### For Rx -##### Ingress non VxLAN packet +##### Ingress traffic without VxLAN encap -If the packets coming from a remote machine to the physical port are not VxLAN tunnel packets: +If the packets coming from a remote machine to the physical port are not VxLAN encapped packets: - Determine the source port of the packet based on which physical port the packet has landed on. - Validate if the source port is part of the bridge, else drop the packet. @@ -259,9 +256,9 @@ If the packets coming from a remote machine to the physical port are not VxLAN t ovs-vsctl add-port br-int ``` -##### Ingress VxLAN packet +##### Ingress traffic with VxLAN encap -If the packets coming from a remote machine to the physical port are not VxLAN tunnel packets: +If the packets coming from a remote machine to the physical port are VxLAN encapped packets: - Determine the source port of the packet based on which physical port the packet has landed - Validate if the source port is part of the bridge, else drop the packet. @@ -280,7 +277,7 @@ If the packets coming from a remote machine to the physical port are not VxLAN t ## Summary - Verification of source port and Associated L2 Bridge: The P4 Control Plane (P4 CP) must ensure the validation of the source port and its corresponding L2 bridge before initiating any further regulation of datapath packet classification. -- Exception Packet Handling for All Protocols: The P4 Control Plane (P4 CP) shall incorporate exception packet handling logic, not limited to ARP but applicable to the first packet of any protocol. +- Exception Packet Handling for all Protocols: The P4 Control Plane (P4 CP) shall incorporate exception packet handling logic, not limited to ARP but applicable to the first packet of any protocol. - Offloading of Networking Functions: The P4 Control Plane (P4 CP) software shall provide support for the offloading of various networking functions as specified in the Linux Networking use case. These networking functions include Layer 2 (L2) and Layer 3 (L3) forwarding, Equal-Cost Multi-Path (ECMP) routing, Link Aggregation Group (LAG), as well as Virtual Extensible LAN (VXLAN) encapsulation and decapsulation. These functions shall support both single and multiple Open vSwitch (OvS) bridges. ## Limitations @@ -292,14 +289,14 @@ Current Linux Networking support for the networking recipe has the following lim - Only OvS bridges are supported. - Configure p4rt-ctl runtime rules before OvS configuration. - Double vlan tag is NOT supported. -- Add all ACC PR's to VSI group 1 -- On ACC firewalld need to be disabled, this service is blocking tunnel packets. +- Add all ACC PR's to VSI group 1. +- On ACC, firewalld need to be disabled. Otherwise, this service will blocking tunneled packets. - systemctl stop firewalld -- Refer LNW-V2 README_P4_CP_NWS which comes along with the p4 program for limitation with router_interface_id action in nexthop_table (Bug created for this) +- Refer LNW-V2 README_P4_CP_NWS which comes along with the P4 program for limitation with router_interface_id action in nexthop_table (Defect filed) - Manually modify context.json to remove NOP hardware action for in context.json from "set_nexthop " action in "nexthop_table". Open defect is present in p4-sde to fix this issue. ```text -Content to be removed under hardware action is +Content to be removed under hardware action in context.json is { "prec": 0, "action_code": "NOP", diff --git a/docs/apps/lnw/es2k/es2k-lnw-overlay-vms.md b/docs/apps/lnw/es2k/es2k-lnw-overlay-vms.md index cccbd058..4977f626 100644 --- a/docs/apps/lnw/es2k/es2k-lnw-overlay-vms.md +++ b/docs/apps/lnw/es2k/es2k-lnw-overlay-vms.md @@ -1,20 +1,3 @@ - # Linux Networking with Overlay VMs @@ -29,10 +12,14 @@ for more details on this feature. Prerequisites: -- Follow steps mentioned in [Deploying P4 Programs for E2100](/guides/es2k/deploying-p4-programs) for bringing up IPU with a particular release build. - Download `hw-p4-programs` TAR file specific to the build and extract it to get `fxp-net_linux-networking-v2` p4 artifacts. Go through `Limitations` specified in `README` and bringup the setup accordingly. - - Modify `sem_num_pages` to 25 and `lem_num_pages` to 10 in `cp_init.cfg` present in IMC. -- For this use case, before booting ACC with a particular release build, modify `acc_apf` value to 16 under `num_default_vport` in file `cp_init.cfg` present in IMC. +- Follow steps mentioned in [Deploying P4 Programs for E2100](/guides/es2k/deploying-p4-programs) for bringing up IPU with a custom P4 package. + - Modify `load_custom_pkg.sh` with following parameters for linux_networking package. +```text + sed -i 's/sem_num_pages = 1;/sem_num_pages = 25;/g' $CP_INIT_CFG + sed -i 's/lem_num_pages = 1;/lem_num_pages = 10;/g' $CP_INIT_CFG + sed -i 's/acc_apf = 4;/acc_apf = 16;/g' $CP_INIT_CFG +``` - Download `IPU_Documentation` TAR file specific to the build and refer to `Getting Started Guide` on how to install compatible `IDPF driver` on host. Once an IDPF driver is installed, bring up SRIOV VF by modifying the `sriov_numvfs` file present under one of the IDPF network devices. Example as below ```bash @@ -126,7 +113,7 @@ Note: Here VSI 9 has been used as one of the ACC port representers and added to ### Start OvS as a separate process -Legacy OvS is used as a control plane for source MAC learning of overlay VM's. OvS should be started as a separate process. +Enhanced legacy OvS is used as a control plane for source MAC learning of overlay VM's. This OvS binary is available as part of ACC build and should be started as a separate process. ```bash export RUN_OVS=/opt/p4/p4-cp-nws diff --git a/docs/guides/es2k/deploying-p4-programs.md b/docs/guides/es2k/deploying-p4-programs.md index 373b88d0..d4bf97d7 100644 --- a/docs/guides/es2k/deploying-p4-programs.md +++ b/docs/guides/es2k/deploying-p4-programs.md @@ -14,47 +14,47 @@ to generate the P4 artifacts required for deployment. Data Path Control Plane (DPCP) starts with a default P4 package. To load a custom P4 package follow below steps: -### 2.1 Interrupt default startup routine +### 2.1 Copy the custom P4 package -Reboot IMC and type ``N`` when the following message is shown on IMC console:: +Copy the custom P4 package (p4_custom.pkg) in `/work/scripts` directory to IMC. -```text -start ipumgmtd and auxiliary script [Y/N] \ -(default start both automatically after 10 seconds)? -``` - -### 2.2 Copy the custom P4 package +### 2.2 Modify the script responsible for loading custom package -Copy the custom P4 package (.pkg) in `/etc/dpcp/package` directory and -overwrite the `default_pkg.pkg`. +Replace the `p4_custom.pkg` with custom package name in `load_custom_pkg.sh` script. -For example, replace `default_pkg.pkg` with `simple_l3_l4_pna.pkg` +Any modifications intended in node policy `cp_init.cfg` should be provided as part of +the same script. ```bash -root@mev-imc:/etc/dpcp/package# ls -lrt /etc/dpcp/package/ -total 2364 --rw-r--r-- 1 root root 963032 Jan 1 04:56 simple_l3_l4_pna.pkg --rw-r--r-- 1 root root 1450456 Jun 8 2023 e2100-default-1.0.3.0.pkg -drwxr-xr-x 2 root root 0 Jun 8 2023 runtime_files -drwxr-xr-x 3 root root 0 Jun 8 2023 ref_pkg -lrwxrwxrwx 1 root root 25 Jun 8 2023 default_pkg.pkg -> e2100-default-1.0.3.0.pkg -root@mev-imc:/etc/dpcp/package# cp simple_l3_l4_pna.pkg default_pkg.pkg +[root@ipu-imc /]# cd /work/scripts +[root@ipu-imc scripts]# cat load_custom_pkg.sh +#!/bin/sh +CP_INIT_CFG=/etc/dpcp/cfg/cp_init.cfg +echo "Checking for custom package..." +if [ -e p4_custom.pkg ]; then + echo "Custom package p4_custom.pkg found. Overriding default package" + cp p4_custom.pkg /etc/dpcp/package/ + rm -rf /etc/dpcp/package/default_pkg.pkg + ln -s /etc/dpcp/package/ p4_custom.pkg /etc/dpcp/package/default_pkg.pkg + sed -i 's/sem_num_pages = 1;/sem_num_pages = 25;/g' $CP_INIT_CFG +else + echo "No custom package found. Continuing with default package" +fi ``` If Communication Channel support is required, [enable the communication channel](enabling-comm-channel.md) before proceeding to the next step. -### 2.3 Start the IMC - -Run the IMC start-up script. +### 2.3 Reboot the IMC ```bash -root@mev-imc:~# /etc/init.d/run_default_init_app +root@mev-imc:~# reboot ``` +Once the IMC reboots successfully, IPU is loaded with the custom P4 package. -By default, `cpf_host` parameter in `/etc/dpcp/cfg/cp_init.cfg` is set to 4 which -enables ACC. If the start-up script is executed successfully, ACC comes up with a +By default, `cpf_host` parameter in node_policy is set to 4 which +enables ACC. If the IMC reboots successfully, ACC comes up with a statically assigned IP address `192.168.0.2` to the eth0 network interface. You can access ACC from IMC over an SSH session using this IP address. diff --git a/docs/guides/es2k/enabling-comm-channel.md b/docs/guides/es2k/enabling-comm-channel.md index d1220aa1..308f8b0a 100644 --- a/docs/guides/es2k/enabling-comm-channel.md +++ b/docs/guides/es2k/enabling-comm-channel.md @@ -7,52 +7,40 @@ running on the Host to communicate with infrap4d running on the ACC. Ports used for communication channels are defined by the node policy on IMC. -## 1. Interrupt default start-up routine +### 2.2 Modify the custom package config script on IMC -Reboot IMC and type ``N`` when the following message is shown on IMC console:: +Modify the `load_custom_pkg.sh` script to specify comm_vports. -```text -start ipumgmtd and auxiliary script [Y/N] \ -(default start both automatically after 10 seconds)? -``` - -## 2. Specify communication channel ports - -The config file uses the function numbers that are defined as below: - -| Function number | Definition | -|------------------------|-------------------| -| 0 | Xeon Host | -| 4 | ACC | -| 5 | IMC | - -Format used to indicate communication channels: -`(([function number, vport_index],[pf_num, vport_index]),...)` - -Modify `/etc/dpcp/cfg/cp_init.cfg` to change the value of `comm_vports`. - -```text -/* IMC_LAN_APF_VPORT_0 ([5,0]) <--> ACC_APF_VPORT_0 ([4,0]) */ -/* HOST0_LAN_APF_VPORT_3 ([0, 3]) <--> ACC_LAN_APF_VPORT_3 [(4,2)]*/ -comm_vports = (([5,0],[4,0]),([0,3],[4,2])); +```bash +[root@ipu-imc /]# cd /work/scripts +[root@ipu-imc scripts]# cat load_custom_pkg.sh +#!/bin/sh +CP_INIT_CFG=/etc/dpcp/cfg/cp_init.cfg +echo "Checking for custom package..." +if [ -e p4_custom.pkg ]; then + echo "Custom package p4_custom.pkg found. Overriding default package" + cp p4_custom.pkg /etc/dpcp/package/ + rm -rf /etc/dpcp/package/default_pkg.pkg + ln -s /etc/dpcp/package/ p4_custom.pkg /etc/dpcp/package/default_pkg.pkg + sed -i 's/sem_num_pages = 1;/sem_num_pages = 25;/g' $CP_INIT_CFG + sed -i "s/comm_vports = ((\[5,0\],\[4,0\]))\;/comm_vports = ((\[5,0\],\[4,0\]),(\[0,3\],\[4,2\]))\;/g" $CP_INIT_CFG +else + echo "No custom package found. Continuing with default package" + sed -i "s/comm_vports = ((\[5,0\],\[4,0\]))\;/comm_vports = ((\[5,0\],\[4,0\]),(\[0,3\],\[4,2\]))\;/g" $CP_INIT_CFG + +fi ``` - This will enable communication between IMC-ACC and Host-ACC. -Note: Changes made to `cp_init.cfg` are not persistent across IMC reboots. - -## 3. Start the IMC - -Run the IMC start-up script. +## 3. Reboot the IMC ```bash -root@mev-imc:~# /etc/init.d/run_default_init_app +root@mev-imc:~# reboot ``` -By default, `cpf_host` parameter in `/etc/dpcp/cfg/cp_init.cfg` is set to 4 which -enables ACC. If the start-up script is executed successfully, ACC comes up with a -statically assigned IP address `192.168.0.2` to the eth0 network interface. -You can access ACC from IMC over an SSH session using this IP address. +If IMC is rebooted successfully, ACC comes up with a statically assigned IP address + `192.168.0.2` to the eth0 network interface. You can access ACC from IMC over an +SSH session using this IP address. ## 3. Load the IDPF driver on Host