You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/apps/lnw/es2k/es2k-linux-networking.md
+41-45Lines changed: 41 additions & 45 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Linux Networking for ES2K
2
2
3
-
Linux Networking provides support for offloading various networking functions, such as L2 forwarding, L3 forwarding, ECMP, and VxLAN encapsulation and decapsulation intelligence to the IPU. This capability empowers overlay services to establish communication with endpoints through VxLAN tunnels, thereby extending the L2 segment across the underlay network. To achieve Linux networking support, we have used legacy OvS for overlay source MAC learning and VxLAN configurations, while relying on the kernel for underlay neighbor discovery, route management, and next-hop information.
3
+
Linux Networking provides support for offloading various networking functions, such as L2 forwarding, L3 forwarding, ECMP, and VxLAN encapsulation and decapsulation intelligence to the IPU. This capability empowers overlay services to establish communication with endpoints through VxLAN tunnels, thereby extending the L2 segment across the underlay network. To achieve Linux networking support, we have enhanced OvS for overlay source MAC learning and VxLAN configurations, while relying on the kernel for underlay neighbor discovery, route management, and next-hop information.
4
4
5
5
## Feature Overview
6
6
@@ -14,7 +14,7 @@ To enable this feature we have,
14
14
-`Infrap4d`: This process includes a p4runtime server. Calls TDI front end to program IPU E2100.
15
15
-`ovs-vswitchd`: This process is integrated with p4runtime intelligence and acts as a gRPC client. Programs IPU E2100 with control plane configuration and forwarding tables by communicating with gRPC server.
16
16
-`p4rt-ctl`: This python CLI includes a p4runtime client. Programs IPU E2100 with runtime rules by communicating with gRPC server.
17
-
-`Kernel stack`: All underlay related configurations are picked by `kernel monitor` thread via netlink events in `infrap4d` and these are programmed in IPU E2100 by calling TDI front end calls.
17
+
-`Kernel stack`: All underlay related configurations are picked by `kernel monitor` thread via netlink events in `infrap4d` and these are programmed in IPU E2100 by calling TDI front end APIs.
18
18
19
19
## Topology
20
20
@@ -24,11 +24,11 @@ This topology breakdown and configuration assumes all VMs are spawned on HOST VF
24
24
25
25
### Topology breakdown
26
26
27
-
- Every VM spawned on top of a VF will have a corresponding port representer in ACC.
28
-
- Every physical port will have a corresponding port representer in ACC.
29
-
- Every physical port will have an uplink (APF netdev) in HOST and this uplink will have a corresponding port representer in ACC.
30
-
- All port representers are associated with an OvS bridge.
31
-
- For VxLAN egress traffic, the underlay port should be associated with a termination bridge and IP to reach the underlay network should be configured on top of this bridge.
27
+
- Every VM spawned on top of a VF will have a corresponding port representor in ACC.
28
+
- Every physical port will have a corresponding port representor in ACC.
29
+
- Every physical port will have an uplink (APF netdev) in HOST and this uplink will have a corresponding port representor in ACC.
30
+
- All port representors are associated with an OvS bridge.
31
+
- For VxLAN egress traffic, the underlay port should be associated with a termination bridge. The IP address to reach the underlay network should be configured on this bridge.
32
32
33
33
## Detailed Design
34
34
@@ -39,8 +39,7 @@ To enable slow path mode:
39
39
- Start the infrap4d process with the Kernel Monitor disabled. Command: `infrap4d -disable-krnlmon`
40
40
- Set environment variable `OVS_P4_OFFLOAD=false` before starting the `ovs-vswitchd` process.
41
41
42
-
In this mode, we need to associate VFs on top of which VMs are created and its port representers, also physical ports with its port representers.
43
-
Configure tables:
42
+
In this mode, VMs are spawned on top of VFs and associated with their port representors. Also, physical ports are associated with their port representors. Configure the following tables to map these in IPU:
44
43
45
44
```text
46
45
- rx_source_port
@@ -51,15 +50,13 @@ Configure tables:
51
50
- rx_phy_port_to_pr_map
52
51
```
53
52
54
-
All port representers (PRs) in ACC should be associated with an OvS bridge. Mapping between PRs and bridges need to be programmed in IPU as well.
55
-
Configure table:
53
+
All port representors (PRs) in ACC should be associated with an OvS bridge. Configure table below to program the mapping between PRs and bridges in IPU:
56
54
57
55
```text
58
56
- source_port_to_bridge_map
59
57
```
60
58
61
-
For egress VxLAN traffic, an OvS VxLAN port needs to be created in ACC and associated to the integration bridge that handles overlay traffic.
62
-
Configure table:
59
+
For egress VxLAN traffic, an OvS VxLAN port needs to be created in ACC with associated integration bridge that handles overlay traffic. Configure following tables to map these in IPU:
@@ -70,14 +67,14 @@ Once these tables are configured refer to packet flow as mentioned below.
70
67
71
68
#### For Tx
72
69
73
-
##### Egress traffic without VxLAN encap
70
+
##### Egress traffic without VxLAN encapsulation
74
71
75
72
Packets coming from overlay network:
76
73
77
74
- Determine the source port of the packet based on which overlay VSI the packet has landed on.
78
75
- Validate if the source port is part of the bridge, else drop the packet.
79
76
- If valid bridge configuration is found, find the PR associated with the bridge and forward the packet to the PR in ACC.
80
-
- OvS control plane receives the packet and forwards the packets to destined PR if MAC is already learnt, else flood the packet in the respective bridge.
77
+
- OvS control plane receives the packet and forwards the packet to destined PR if MAC is already learnt, else floods the packet in the valid bridge found.
81
78
- Sample OvS config:
82
79
83
80
```bash
@@ -86,37 +83,38 @@ Packets coming from overlay network:
86
83
ovs-vsctl add-port br-int <Physical port PR>
87
84
```
88
85
89
-
##### Egress traffic with VxLAN encap
86
+
##### Egress traffic with VxLAN encapsulation
90
87
91
88
Packets coming from overlay network:
92
89
93
90
- Determine the source port of the packet based on which overlay VSI the packet has landed on.
94
91
- Validate if the source port is part of the bridge, else drop the packet.
95
92
- If valid bridge configuration is found, find the PR associated with the bridge and forward the packet to the PR in ACC.
96
-
- OvS control plane receives the packet and forwards the packets to the destined VxLAN port if MAC is already learnt, else flood the packet in the respective bridge.
93
+
- OvS control plane receives the packet and forwards the packet to the destined VxLAN port if MAC is already learnt, else flood the packet in the valid bridge found.
97
94
- Once the packet reaches the VxLAN port, here the kernel checks the routing table to reach `remote_ip` that is configured for the OvS VxLAN tunnel.
98
-
- Underlay network to reach `remote_ip` is configured on a TEP termination bridge. Here, the kernel resolves ARP of the underlay network.
99
-
- Once ARP is resolved, kernel encapsulates the packet and this packet will be forwarded to the destined PR of the physical port if MAC is already learnt, else flood the packet inthe respective TEP termination bridge.
95
+
- Underlay network to reach `remote_ip` is configured on a TEP termination bridge. The kernel resolves the ARP for underlay network.
96
+
- Once ARP is resolved, the kernel encapsulates the packet. It then forwards the packet to the PR of the physical port ifthe MAC is already learnt, or floods it to the TEP termination bridgeif not.
100
97
- Sample OvS config:
101
98
102
99
```bash
103
100
ovs-vsctl add-br br-int
104
101
ovs-vsctl add-port br-int <Overlay VMs PR>
105
102
ovs-vsctl add-port br-int <VxLAN port with VxLAN config>
106
-
ovs-vsctl add-br br-tep-termination ## this bridge has IP to reach remote TEP
103
+
ovs-vsctl add-br br-tep-termination
104
+
# Configure bridge with IP address to reach remote TEP
107
105
ovs-vsctl add-port br-tep-termination <Physical port PR>
108
106
```
109
107
110
108
#### For Rx
111
109
112
-
##### Ingress non VxLAN packet
110
+
##### Ingress traffic without VxLAN encapsulation
113
111
114
-
If the packets coming from a remote machine to the physical port are not VxLAN tunnel packets:
112
+
If the packet coming from a remote machine to the physical port is not VxLAN encapsulated packet:
115
113
116
114
- Determine the source port of the packet based on which physical port the packet has landed on.
117
115
- Validate if the source port is part of the bridge, else drop the packet.
118
116
- If valid bridge configuration is found, find the PR associated with the bridge and forward the packet to the PR in ACC.
119
-
- OvS control plane receives the packet and forwards the packets to destined PR if MAC is already learnt, elseflood the packet in the respective bridge.
117
+
- OvS control plane receives the packet and forwards it to destined PR if MAC is already learnt, elsefloods the packet in the valid bridge found.
120
118
- Sample OvS config:
121
119
122
120
```bash
@@ -125,15 +123,15 @@ If the packets coming from a remote machine to the physical port are not VxLAN t
125
123
ovs-vsctl add-port br-int <Physical port PR>
126
124
```
127
125
128
-
##### Ingress VxLAN packet
126
+
##### Ingress traffic with VxLAN encapsulation
129
127
130
-
If the packets coming from a remote machine to the physical port are not VxLAN tunnel packets:
128
+
If the packet coming from a remote machine to the physical port is VxLAN encapsulated packet:
131
129
132
-
- Determine the source port of the packet based on which physical port the packet has landed
130
+
- Determine the source port of the packet based on which physical port the packet has landed on.
133
131
- Validate if the source port is part of the bridge, else drop the packet.
134
132
- If valid bridge configuration is found, find the PR associated with the physical port and forward the packet to the PR in ACC.
135
133
- OvS control plane receives the packet on a TEP termination bridge, packet gets decapped and sent to VxLAN port.
136
-
- Since VxLAN port and overlay VMs PR are in the same bridge, if the overlay MAC is already learnt the packet will be forwarded to destined PR else packet will be flooded in the respective bridge.
134
+
- Since VxLAN port and overlay VMs PR are in the same bridge, if the overlay MAC is already learnt the packet will be forwarded to destined PR else packet will be flooded in the valid bridge found.
137
135
- Sample OvS config:
138
136
139
137
```bash
@@ -148,10 +146,10 @@ If the packets coming from a remote machine to the physical port are not VxLAN t
148
146
149
147
To enable fast path mode:
150
148
151
-
- Start the infrap4d process.
149
+
- Start the infrap4d process. Command: `infrap4d`
152
150
- Remove the environment variable `OVS_P4_OFFLOAD=false` before starting the `ovs-vswitchd` process.
153
151
154
-
In this mode, we need to associate VFs on top which VMs are created and its port representers and also physical ports with its port representers.
152
+
In this mode, we need to associate VFs with the VMs and its port representors along with physical ports and its port representors.
155
153
Configure tables:
156
154
157
155
```text
@@ -204,7 +202,7 @@ Once these tables are configured refer to packet flow as mentioned below.
204
202
205
203
#### For Tx
206
204
207
-
##### Egress traffic without VxLAN encap
205
+
##### Egress traffic without VxLAN encapsulation
208
206
209
207
Packets coming from overlay network:
210
208
@@ -220,7 +218,7 @@ Packets coming from overlay network:
220
218
ovs-vsctl add-port br-int <Physical port PR>
221
219
```
222
220
223
-
##### Egress traffic with VxLAN encap
221
+
##### Egress traffic with VxLAN encapsulation
224
222
225
223
Packets coming from overlay network:
226
224
@@ -243,9 +241,9 @@ Packets coming from overlay network:
243
241
244
242
#### For Rx
245
243
246
-
##### Ingress non VxLAN packet
244
+
##### Ingress traffic without VxLAN encapsulation
247
245
248
-
If the packets coming from a remote machine to the physical port are not VxLAN tunnel packets:
246
+
If the packet coming from a remote machine to the physical port is not VxLAN encapsulated packet:
249
247
250
248
- Determine the source port of the packet based on which physical port the packet has landed on.
251
249
- Validate if the source port is part of the bridge, else drop the packet.
@@ -259,9 +257,9 @@ If the packets coming from a remote machine to the physical port are not VxLAN t
259
257
ovs-vsctl add-port br-int <Physical port PR>
260
258
```
261
259
262
-
##### Ingress VxLAN packet
260
+
##### Ingress traffic with VxLAN encapsulation
263
261
264
-
If the packets coming from a remote machine to the physical port are not VxLAN tunnel packets:
262
+
If the packet coming from a remote machine to the physical port are VxLAN encapsulated packet:
265
263
266
264
- Determine the source port of the packet based on which physical port the packet has landed
267
265
- Validate if the source port is part of the bridge, else drop the packet.
@@ -279,27 +277,25 @@ If the packets coming from a remote machine to the physical port are not VxLAN t
279
277
280
278
## Summary
281
279
282
-
- Verification of source port and Associated L2 Bridge: The P4 Control Plane (P4 CP) must ensure the validation of the source port and its corresponding L2 bridge before initiating any further regulation of datapath packet classification.
283
-
- Exception Packet HandlingforAll Protocols: The P4 Control Plane (P4 CP) shall incorporate exception packet handling logic, not limited to ARP but applicable to the first packet of any protocol.
284
-
- Offloading of Networking Functions: The P4 Control Plane (P4 CP) software shall provide support forthe offloading of various networking functions as specifiedin the Linux Networking use case. These networking functions include Layer 2 (L2) and Layer 3 (L3) forwarding, Equal-Cost Multi-Path (ECMP) routing, Link Aggregation Group (LAG), as well as Virtual Extensible LAN (VXLAN) encapsulation and decapsulation. These functions shall support both single and multiple Open vSwitch (OvS) bridges.
280
+
- Verification of source port and associated L2 Bridge: The P4 Control Plane (P4 CP) must ensure the validation of the source port and its corresponding L2 bridge before initiating any further regulation of datapath packet classification.
281
+
- Exception packet handlingforall protocols: The P4 Control Plane (P4 CP) shall incorporate exception packet handling logic, not limited to ARP but applicable to the first packet of any protocol.
282
+
- Offloading of networking functions: The P4 Control Plane (P4 CP) software shall provide support forthe offloading of various networking functions as specifiedin the Linux Networking use case. These networking functions include Layer 2 (L2) and Layer 3 (L3) forwarding, Equal-Cost Multi-Path (ECMP) routing, Link Aggregation Group (LAG), as well as Virtual Extensible LAN (VXLAN) encapsulation and decapsulation. These functions shall support both single and multiple Open vSwitch (OvS) bridges.
285
283
286
284
## Limitations
287
285
288
286
Current Linux Networking support for the networking recipe has the following limitations:
289
287
290
288
- VLAN configuration on OvS is supported only for NATIVE-TAG and NATIVE-UNTAG modes.
291
-
- Physical port's port representer should be added as the 1st port in Tunnel TEP bridge (br-tep-termination).
289
+
- Physical port's port representor should be added as the first port in tunnel TEP bridge (br-tep-termination).
292
290
- Only OvS bridges are supported.
293
291
- Configure p4rt-ctl runtime rules before OvS configuration.
294
292
- Double vlan tag is NOT supported.
295
-
- Add all ACC PR's to VSI group 1
296
-
- On ACC firewalld need to be disabled, this service is blocking tunnel packets.
293
+
- Add all ACC PR's to VSI group 1.
294
+
- On ACC, firewall needs to be disabled. Otherwise, this service will block encapsulated packets.
297
295
- systemctl stop firewalld
298
-
- Refer LNW-V2 README_P4_CP_NWS which comes along with the p4 program forlimitation with router_interface_id actionin nexthop_table (Bug created for this)
299
-
- Manually modify context.json to remove NOP hardware action forin context.json from "set_nexthop " actionin"nexthop_table". Open defect is present in p4-sde to fix this issue.
300
-
296
+
- See LNW-V2 README_P4_CP_NWS, which comes with the P4 program formore information about limitationsin router_interface_id action in nexthop_table(Defect filed).
297
+
- Manually modify context.json to remove NOP hardware action forin context.json from "set_nexthop " actionin"nexthop_table". Open defect is present in p4-sde to fix this issue. Content to be removed under hardware action in context.json is
0 commit comments