Skip to content

Commit 9a1e9de

Browse files
committed
northd: Avoid matching on ct_state.dnat in logical flows.
Since commit f2e8130 ("northd: Explicitly handle SNAT for ICMP need frag."), if LRP.gw_mtu is set, ovn-northd adds a priority 160 logical flow in the lr_in_larger_pkts router pipeline that matches on "ct.trk && ct.rpl && ct.dnat" for egress traffic that's larger than the configured MTU (if REGBIT_PKT_LARGER] == 1): table=23(lr_in_larger_pkts ), priority=160, match=(inport == "lrp1" && outport == "lrp0" && ip4 && reg9[1] && reg9[0] == 0 && ct.trk && ct.rpl && ct.dnat), action=(icmp4_error {...};) The lr_in_larger_pkts logical pipeline stage (23) is translated by ovn-controller to OpenFlow table 31. table=31, priority=160, ct_state=+rpl+trk+dnat,ip,reg9=0x2/0x3,reg14=0x2,reg15=0x1,metadata=0x3 actions=controller(...) Due to the way ovs-vswitchd translates OF rules to datapath flows, all OpenFlow rules in table 31 that have a priority lower than 160 automatically get an implicit match on the inverse ct_state match that's generated for the priority 160 OF rule, that is: ct_state=-trk-rpl-dnat On specific NICs, such matches (on dnat/snat ct_state) are not offloadable to hardware. However, in OVN, logical switches and logical routers pipelines both get translated to the same set of OpenFlow tables. In theory that's fine because the logical datapath distinction happens thanks to additional metadata field matches (the metadata OF field is actually logical datapath id). But in this specific case it means that all logical switch traffic that hits OpenFlow table 31 will also generate a datapath flow that matches on ct_state=-trk-rpl-dnat, making it non-offloadable. E.g., in an OVN cluster with a logical switch that has no stateful config (no stateful ACLs or LBs) traffic hitting the following logical switch pipeline stage will also fail to be offloaded to hardware if there's a logical router with gw_mtu configured: table=23(ls_in_dhcp_options ), priority=0, match=(1), action=(next;) That is essentially all traffic forwarded on that logical switch. Change the way we match on large packets that have been DNATed and instead of directly matching on ct_state first save that into a register (using the new ct_state_save() logical action). That means that with the configuration mentioned above: - all routed traffic that is less than MTU will now be forwarded by a datapath flow matching on ct_state=-trk - all switched traffic will be forwarded by a datapath flow matching on ct_state=-trk In order to not disrupt upgrades on stable branches generate the new logical flows only if the ct_state_save() action is supported by all ovn-controllers. Reported-at: https://issues.redhat.com/browse/FDP-1271 Fixes: f2e8130 ("northd: Explicitly handle SNAT for ICMP need frag.") Acked-by: Ales Musil <[email protected]> Signed-off-by: Dumitru Ceara <[email protected]> (cherry picked from commit 35a3092)
1 parent 410c478 commit 9a1e9de

File tree

4 files changed

+155
-78
lines changed

4 files changed

+155
-78
lines changed

northd/northd.c

Lines changed: 40 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,19 @@ BUILD_ASSERT_DECL(ACL_OBS_STAGE_MAX < (1 << 2));
205205
#define REG_OBS_COLLECTOR_ID_NEW "reg8[0..7]"
206206
#define REG_OBS_COLLECTOR_ID_EST "reg8[8..15]"
207207

208+
/* Register used for storing the ct_state in the router pipeline and
209+
* ct_saved_state helpers, matching the ct_state bits definitions from
210+
* ovs-fields(7). */
211+
#define REG_CT_STATE "reg4[0..7]"
212+
213+
static const char *reg_ct_state[] = {
214+
#define CS_STATE(ENUM, INDEX, NAME) \
215+
[CS_##ENUM] = "reg4[" #INDEX "]",
216+
217+
CS_STATES
218+
#undef CS_STATE
219+
};
220+
208221
/* Register used for temporarily store ECMP eth.src to avoid masked ct_label
209222
* access. It doesn't really occupy registers because the content of the
210223
* register is saved to stack and then restored in the same flow.
@@ -268,10 +281,12 @@ BUILD_ASSERT_DECL(ACL_OBS_STAGE_MAX < (1 << 2));
268281
* | | | 1 | | | |
269282
* +-----+---------------------------+---+-----------------+---+------------------------------------+
270283
* | R4 | REG_LB_AFF_BACKEND_IP4 | X | | | |
271-
* | | | R | | | |
272-
* +-----+---------------------------+ E | UNUSED | X | |
273-
* | R5 | UNUSED | G | | X | |
274-
* | | | 2 | | R | LB_L3_AFF_BACKEND_IP6 |
284+
* | | REG_CT_STATE | R | | | |
285+
* | | (>= IN_CHK_PKT_LEN && | R | | | |
286+
* | | <= IN_LARGER_PKTS) | E | | | |
287+
* +-----+---------------------------+ G | UNUSED | X | |
288+
* | R5 | UNUSED | 2 | | X | |
289+
* | | | | | R | LB_L3_AFF_BACKEND_IP6 |
275290
* +-----+---------------------------+---+-----------------+ E | (<= IN_DNAT) |
276291
* | R6 | UNUSED | X | | G | |
277292
* | | | R | | 1 | |
@@ -14576,6 +14591,7 @@ build_icmperr_pkt_big_flows(struct ovn_port *op, int mtu,
1457614591
const struct shash *meter_groups, struct ds *match,
1457714592
struct ds *actions, enum ovn_stage stage,
1457814593
struct ovn_port *outport,
14594+
const char *ct_state_match,
1457914595
struct lflow_ref *lflow_ref)
1458014596
{
1458114597
const char *ipv4_meter = copp_meter_get(COPP_ICMP4_ERR, op->od->nbr->copp,
@@ -14587,15 +14603,17 @@ build_icmperr_pkt_big_flows(struct ovn_port *op, int mtu,
1458714603
ds_put_format(match, "inport == %s", op->json_key);
1458814604

1458914605
if (outport) {
14606+
ovs_assert(ct_state_match);
14607+
1459014608
ds_put_format(match, " && outport == %s", outport->json_key);
1459114609

1459214610
create_icmp_need_frag_lflow(op, mtu, actions, match, ipv4_meter,
1459314611
lflows, lflow_ref, stage, 160, false,
14594-
"ct.trk && ct.rpl && ct.dnat",
14612+
ct_state_match,
1459514613
"flags.icmp_snat = 1; ");
1459614614
create_icmp_need_frag_lflow(op, mtu, actions, match, ipv6_meter,
1459714615
lflows, lflow_ref, stage, 160, true,
14598-
"ct.trk && ct.rpl && ct.dnat",
14616+
ct_state_match,
1459914617
"flags.icmp_snat = 1; ");
1460014618
}
1460114619

@@ -14621,15 +14639,25 @@ build_check_pkt_len_flows_for_lrp(struct ovn_port *op,
1462114639

1462214640
ds_clear(match);
1462314641
ds_put_format(match, "outport == %s", op->json_key);
14642+
const char *mtu_flow_action = features->ct_state_save
14643+
? REG_CT_STATE " = ct_state_save(); next;"
14644+
: "next;";
1462414645
build_gateway_mtu_flow(lflows, op, S_ROUTER_IN_CHK_PKT_LEN, 50, 55,
14625-
match, actions, &op->nbrp->header_,
14626-
lflow_ref, "next;");
14646+
match, actions, &op->nbrp->header_, lflow_ref,
14647+
"%s", mtu_flow_action);
1462714648

1462814649
/* ingress traffic */
1462914650
build_icmperr_pkt_big_flows(op, gw_mtu, lflows, meter_groups,
1463014651
match, actions, S_ROUTER_IN_IP_INPUT,
14631-
NULL, lflow_ref);
14632-
14652+
NULL, NULL, lflow_ref);
14653+
14654+
/* Additional match at egress on tracked and reply and dnat-ed traffic. */
14655+
char *ct_match = features->ct_state_save
14656+
? xasprintf("%s && %s && %s",
14657+
reg_ct_state[CS_TRACKED],
14658+
reg_ct_state[CS_REPLY_DIR],
14659+
reg_ct_state[CS_DST_NAT])
14660+
: xstrdup("ct.trk && ct.rpl && ct.dnat");
1463314661
for (size_t i = 0; i < op->od->nbr->n_ports; i++) {
1463414662
struct ovn_port *rp = ovn_port_find(lr_ports,
1463514663
op->od->nbr->ports[i]->name);
@@ -14640,8 +14668,9 @@ build_check_pkt_len_flows_for_lrp(struct ovn_port *op,
1464014668
/* egress traffic */
1464114669
build_icmperr_pkt_big_flows(rp, gw_mtu, lflows, meter_groups,
1464214670
match, actions, S_ROUTER_IN_LARGER_PKTS,
14643-
op, lflow_ref);
14671+
op, ct_match, lflow_ref);
1464414672
}
14673+
free(ct_match);
1464514674

1464614675
if (features->ct_commit_nat_v2) {
1464714676
ovn_lflow_add_with_hint(lflows, op->od, S_ROUTER_OUT_POST_SNAT, 100,

northd/ovn-northd.8.xml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4835,12 +4835,14 @@ output;
48354835
integer value, this table adds a priority-50 logical flow with
48364836
the match <code>outport == <var>GW_PORT</var></code> where
48374837
<var>GW_PORT</var> is the gateway router port and applies the
4838-
action <code>check_pkt_larger</code> and advances the packet to
4839-
the next table.
4838+
actions <code>check_pkt_larger</code> and <code>ct_state_save</code>
4839+
and then advances the packet to the next table.
48404840
</p>
48414841

48424842
<pre>
4843-
REGBIT_PKT_LARGER = check_pkt_larger(<var>L</var>); next;
4843+
REGBIT_PKT_LARGER = check_pkt_larger(<var>L</var>);
4844+
REG_CT_STATE = ct_state_save();
4845+
next;
48444846
</pre>
48454847

48464848
<p>

0 commit comments

Comments
 (0)