networkmanager: add support for "routing-policy" (LP: #2086544) #526

drafnel · 2024-11-02T23:52:14Z

Description

NetworkManager supports ip rules via the "routing-ruleX" field. Let's teach the networkmanager backend to emit "routing-ruleX" entries for each "routing-policy" entry in the netplan yaml.

For example, the following netplan configuration snippet:

    eth0:
      addresses:
      - "10.1.2.3/24"
      routing-policy:
        - to: 8.7.6.5
          table: 254
          priority: 10
        - to: 1.2.3.4
          from: 8.7.6.5
          table: 253
          mark: 18
          type-of-service: 4
          priority: 11

..should generate something like the following in the NetworkManager configuration:

    [ipv4]
    method=manual
    address1=10.1.2.3/24
    routing-rule1=to 8.7.6.5 table 254 priority 10
    routing-rule2=from 8.7.6.5 to 1.2.3.4 tos 4 fwmark 18 table 253 priority 11

Checklist

Runs make check successfully.
Retains code coverage (make check-coverage).
New/changed keys in YAML format are documented.
(Optional) Adds example YAML for new feature.
(Optional) Closes an open bug in Launchpad.
- https://bugs.launchpad.net/netplan/+bug/2086544

drafnel · 2024-11-04T20:51:41Z

Are the two checks that are failing broken? Guidance here would be appreciated.

slyon · 2024-11-06T14:28:47Z

Are the two checks that are failing broken? Guidance here would be appreciated.

Yes, those failures seem to be unrelated.

slyon

Thank you very much for your contribution to Netplan! This is a valuable change and much appreciated.

I especially like the fact that you adopted the corresponding integration tests, so we can be sure it works on real systems!

For future work, may I suggest implementing the other way around, too? E.g. update src/parse-nm.c (similar to the existing parse_routes() function), to read a NM keyfile, which should then be picked up automatically by src/netplan.c to emit corresponding YAML. But this could be done in a follow-up PR, does not need to be part of this PR.

src/nm.c

tests/integration/routing.py

src/parse.c

drafnel · 2024-11-06T19:24:38Z

For future work, may I suggest implementing the other way around, too? E.g. update src/parse-nm.c (similar to the existing parse_routes() function), to read a NM keyfile, which should then be picked up automatically by src/netplan.c to emit corresponding YAML.

I've been looking at that. Do you have any thoughts about how best to handle NetworkManager routing-rules that contain elements not supported by netplan? We can obviously skip them and they will be imported as pass-through NetworkManager options. The problem is when there are multiple routing-ruleX entries and some can be represented natively by the netplan yaml and some cannot. In that case, I suspect we cannot just naively convert some of the rules to native netplan and allow the others to be recorded as pass-through entries because the NetworkManager rules are numbered, like "routing-rule2". If we record some of the rules in native netplan and some as pass-through rules and then generate the NetworkManager configuration, then new NetworkManager rules will be emitted starting with "routing-rule1", "routing-rule2", potentially clobbering pass-through rules with the same numbering.

For example, consider a NetworkManager keyfile containing:

routing-rule1=from 1.2.3.4 to 5.6.7.8 tos 4 fwmark 0xca9e iif eth0 oif eth1 pref uidrange 1000-1010 ipproto 6 sport 1024-2048 dport 4096-8192 tun_id 13 priority 99 table 254
routing-rule2=priority 99 to 1.2.3.4 table 254

routing-rule2 would be imported and recorded natively as a netplan routing-policy entry while routing-rule1 would be recorded as a pass-through rule since it contains fields not supported by netplan. When the NetworkManager config is then generated from the netplan yaml, it would emit a "routing-rule1" from its own "routing-policy" and then pass through the original "routing-rule1" from the pass-through section.

The easiest solution may be to just have an all-or-none policy when importing the NetworkManager routing-ruleXs. Otherwise it seems like we'd need to be aware of any routing-ruleXs in the pass-through section and take them into account when generating the NetworkManager config. Thoughts?

slyon · 2024-11-12T13:59:59Z

For example, consider a NetworkManager keyfile containing:
routing-rule1=from 1.2.3.4 to 5.6.7.8 tos 4 fwmark 0xca9e iif eth0 oif eth1 pref uidrange 1000-1010 ipproto 6 sport 1024-2048 dport 4096-8192 tun_id 13 priority 99 table 254
routing-rule2=priority 99 to 1.2.3.4 table 254
routing-rule2 would be imported and recorded natively as a netplan routing-policy entry while routing-rule1 would be recorded as a pass-through rule since it contains fields not supported by netplan. When the NetworkManager config is then generated from the netplan yaml, it would emit a "routing-rule1" from its own "routing-policy" and then pass through the original "routing-rule1" from the pass-through section.

Yes, this is a very valid concern!

The easiest solution may be to just have an all-or-none policy when importing the NetworkManager routing-ruleXs. Otherwise it seems like we'd need to be aware of any routing-ruleXs in the pass-through section and take them into account when generating the NetworkManager config. Thoughts?

I think one relatively easy approach could be parse all the routing-ruleXs into native Netplan routing-policy, logging a warning when we cannot map any of them. But then do not clear any of them from the keyfile/passthrough (don't call _kf_clear_key(...) for any routing-ruleX). This way we will be able to save a Netplan native mapping, but in the worst case everything would be overwritten by the passthrough later on.

Then, in src/validation.c, we can check if the amount of native routing-policy rules matches the routing-ruleXs in NM.passthrough, and clear all the passthrough values. WDYT about this approach?

PS: As mentioned above, please consider this additional work as a non-blocking request, which can easily be delivered in a follow-up PR in the future.

slyon

Thank you very much for addressing my concerns. This is now ready to be merged!

@drafnel Do you want to add anything on top of it (from the "future work" discussion), or shall we get this merged, first?

drafnel · 2024-11-13T13:53:57Z

I was going to ask how you wanted to do that. I've implemented your suggestion to emit a warning when priority is not set for a rule, which I think can safely be added on top. I'll push this out later today. The other change, to parse the NetworkManager config, will require more consideration.

…

On November 13, 2024 4:12:30 AM CST, "Lukas Märdian" ***@***.***> wrote: @slyon approved this pull request. Thank you very much for addressing my concerns. This is now ready to be merged! @drafnel Do you want to add anything on top of it (from the "future work" discussion), or shall we get this merged, first? -- Reply to this email directly or view it on GitHub: #526 (review) You are receiving this because you were mentioned. Message ID: ***@***.***>

drafnel · 2024-11-14T00:30:33Z

The easiest solution may be to just have an all-or-none policy when importing the NetworkManager routing-ruleXs. Otherwise it seems like we'd need to be aware of any routing-ruleXs in the pass-through section and take them into account when generating the NetworkManager config. Thoughts?

I think one relatively easy approach could be parse all the routing-ruleXs into native Netplan routing-policy, logging a warning when we cannot map any of them. But then do not clear any of them from the keyfile/passthrough (don't call _kf_clear_key(...) for any routing-ruleX). This way we will be able to save a Netplan native mapping, but in the worst case everything would be overwritten by the passthrough later on.

Then, in src/validation.c, we can check if the amount of native routing-policy rules matches the routing-ruleXs in NM.passthrough, and clear all the passthrough values. WDYT about this approach?

Were you thinking of doing this check in parse_nm.c:read_passthrough()? It doesn't look like the Netplan object is available within that function to be able to count the number of native routing-policy rules.

It seems like it would be fine to do this in the same function that parses the rules (i.e. parse_ip_rules()).

But, how concerned are you about having two versions of the routing policy configuration in the same file and the possibility of someone editing the netplan yaml by hand? It sounds like that's what you were suggesting if we parse each routing-ruleX, if possible, into a native routing-policy config and keep all of the routing-ruleX entries as pass-through configs when we fail to parse any of the routing-ruleXs into routing-policy configs. So we could end up with something like:

    routing-policy:
    - table: 254
      priority: 99
      to: "1.2.3.4"
    - table: 252
      priority: 97
      to: "5.6.7.8"
    networkmanager:
      passthrough:
        ipv4.routing-rule1: priority 99 to 1.2.3.4 table 254
        ipv4.routing-rule2: priority 98 iif eth1 dport 1024-2048 table 253
        ipv4.routing-rule3: priority 97 to 5.6.7.8 table 252

I had thought that a reasonable approach, based on your original suggestion, might be to clear each passthrough rule as long as we've been able to successfully convert the routing-ruleX into a routing-policy rule, but once we encounter a rule that cannot be converted, stop clearing the routing-ruleXs. We still convert as many routing-ruleXs as possible into routing-policy rules, but the ones that are converted after the one we failed to parse will be overwritten by the pass-through rules. For example, the above snippet would become something like:

    routing-policy:
    - table: 254
      priority: 99
      to: "1.2.3.4"
    - table: 252
      priority: 97
      to: "5.6.7.8"
    networkmanager:
      passthrough:
        ipv4.routing-rule2: priority 98 iif eth1 dport 1024-2048 table 253
        ipv4.routing-rule3: priority 97 to 5.6.7.8 table 252

i.e. since routing-rule1 was successfully parsed, it was cleared. The routing-rule2 would fail to convert since we don't support 'iif' etc. so then we stop clearing the routing-ruleXs. routing-rule3 is successfully converted to a routing-policy, but we still keep the routing-rule3. When this is converted back to NetworkManager key file, then the first routing-policy would become routing-rule1, routing-rule2 would override the second routing-policy (which was derived from routing-rule3), and routing-rule3 would pass through.

But that seems a little fragile and confusing to me to mix the two configuration forms like this. Is there actually any value in writing the routing-policy rules if they'll be overridden or supplemented by the routing-ruleX rules? I don't have enough experience with netplan to feel strongly about this either way. Or should we just take an all-or-none approach and refrain from emitting any routing-policy entries if we encounter a routing-ruleX that we cannot convert?

NetworkManager supports ip rules via the "routing-ruleX" field. Let's teach the networkmanager backend to emit "routing-ruleX" entries for each "routing-policy" entry in the netplan yaml. Update the tests for the new behavior and add additional tests to exercise the new code paths.

Since the check of whether r->table == nd->vrf_table has already failed in the initial 'if' conditional, r->table is necessarily != nd->vrf_table. It is not necessary to explicitly check for this state in the 'else if' conditional. Let's not.

The iproute2 guidance says "Each rule should have an explicitly set unique priority value", and further warns: For historical reasons ip rule add does not require any priority value and allows the priority value to be non-unique. If the user had not supplied a priority value then one was assigned by the kernel. If the user requested creating a rule with a priority value which already existed then the kernel did not reject the request and added the new rule before all old rules of the same priority. This is a mistake in the current design, nothing more. It should be fixed by the time you read this so please do not rely on this feature. You should always use explicit priorities when creating rules. So let's inspect each routing-policy entry and emit a warning if a priority has not been specified. ref. http://www.policyrouting.org/iproute2.doc.html#ss9.6.1

slyon

I was going to ask how you wanted to do that. I've implemented your suggestion to emit a warning when priority is not set for a rule, which I think can safely be added on top. I'll push this out later today.

ACK, thanks! LGTM.
I took the freedom to rebase your branch on top of origin/main, added curly braces for the new policy validation warning (as that's what we want to do everywhere these days), and prefixed two of the commit messages with ATTN: so that we can consider the stricter schema parsing/validation logic when backporting those changes in the distro.

All CI is now green as well (cf. #528). So let's get this merged!

The other change, to parse the NetworkManager config, will require more consideration.

Let's move this work to an independent PR. More on this in a follow-up comment.

slyon · 2024-11-14T13:34:07Z

src/validation.c

@@ -523,10 +523,13 @@ adopt_and_validate_vrf_routes(__unused const NetplanParser *npp, GHashTable *net
        if (nd->ip_rules) {
            for (size_t i = 0; i < nd->ip_rules->len; i++) {
                NetplanIPRule* r = g_array_index(nd->ip_rules, NetplanIPRule*, i);
+                if (r->priority == NETPLAN_IP_RULE_PRIO_UNSPEC) {
+                    g_warning("%s: No priority specified for routing-policy %zu", nd->id, i);


note (non-blocking): I just see (after merging) that this only triggers for NETPLAN_DEF_TYPE_VRF NetDefs. We might want to have it for all interface types, eventually.

slyon · 2024-11-14T14:47:53Z

The easiest solution may be to just have an all-or-none policy when importing the NetworkManager routing-ruleXs. Otherwise it seems like we'd need to be aware of any routing-ruleXs in the pass-through section and take them into account when generating the NetworkManager config. Thoughts?

I think one relatively easy approach could be parse all the routing-ruleXs into native Netplan routing-policy, logging a warning when we cannot map any of them. But then do not clear any of them from the keyfile/passthrough (don't call _kf_clear_key(...) for any routing-ruleX). This way we will be able to save a Netplan native mapping, but in the worst case everything would be overwritten by the passthrough later on.
Then, in src/validation.c, we can check if the amount of native routing-policy rules matches the routing-ruleXs in NM.passthrough, and clear all the passthrough values. WDYT about this approach?

Were you thinking of doing this check in parse_nm.c:read_passthrough()? It doesn't look like the Netplan object is available within that function to be able to count the number of native routing-policy rules.

It seems like it would be fine to do this in the same function that parses the rules (i.e. parse_ip_rules()).

My initial idea was to handle it in src/validation.c along the lines of validation.c:adopt_and_validate_vrf_routes(). But looking into it more closely, I agree that it could probably be done in a parse_ip_rules() function directly, as we always have the full NetworkManager connection-profile/keyfile available and there are no cross-references/dependencies that influece the routing-policy rules of a keyfile.

So yes, I agree it should be done in a new parse-nm.c:parse_ip_rules() function; the more isolated and self-contained, the better!

note (non-blocking): Looking into parse-nm.c:parse_routes(), I realize we have the very same issue of potentially conflicting routeX[_options] from networkmanager.passthrough and Netplan native routes there as well. Once we come up with a good solution for routing-policy, we maybe should apply that to routes, too.

But, how concerned are you about having two versions of the routing policy configuration in the same file and the possibility of someone editing the netplan yaml by hand? It sounds like that's what you were suggesting if we parse each routing-ruleX, if possible, into a native routing-policy config and keep all of the routing-ruleX entries as pass-through configs when we fail to parse any of the routing-ruleXs into routing-policy configs. So we could end up with something like:
    routing-policy:
    - table: 254
      priority: 99
      to: "1.2.3.4"
    - table: 252
      priority: 97
      to: "5.6.7.8"
    networkmanager:
      passthrough:
        ipv4.routing-rule1: priority 99 to 1.2.3.4 table 254
        ipv4.routing-rule2: priority 98 iif eth1 dport 1024-2048 table 253
        ipv4.routing-rule3: priority 97 to 5.6.7.8 table 252

Right. In the end, we want to have all information (as much as possible) available in the defined native Netplan schema, so that external consumers that try to query the configuration (e.g. using netplan get network.DEF_TYPE.IFACE.routing-policy) can receive all of it. The problem here is that even though having the supported rules in native routing-policy, the passthrough would still be authoritative and overwrite the native routing-policy, resulting in data received via netplan get could potentially be wrong. That's bad.

I had thought that a reasonable approach, based on your original suggestion, might be to clear each passthrough rule as long as we've been able to successfully convert the routing-ruleX into a routing-policy rule, but once we encounter a rule that cannot be converted, stop clearing the routing-ruleXs. We still convert as many routing-ruleXs as possible into routing-policy rules, but the ones that are converted after the one we failed to parse will be overwritten by the pass-through rules. For example, the above snippet would become something like:
    routing-policy:
    - table: 254
      priority: 99
      to: "1.2.3.4"
    - table: 252
      priority: 97
      to: "5.6.7.8"
    networkmanager:
      passthrough:
        ipv4.routing-rule2: priority 98 iif eth1 dport 1024-2048 table 253
        ipv4.routing-rule3: priority 97 to 5.6.7.8 table 252
i.e. since routing-rule1 was successfully parsed, it was cleared. The routing-rule2 would fail to convert since we don't support 'iif' etc. so then we stop clearing the routing-ruleXs. routing-rule3 is successfully converted to a routing-policy, but we still keep the routing-rule3. When this is converted back to NetworkManager key file, then the first routing-policy would become routing-rule1, routing-rule2 would override the second routing-policy (which was derived from routing-rule3), and routing-rule3 would pass through.

thought: ~~What happens when the routing-ruleXs are non-ordered in the keyfile? E.g. ipv4.routing-rule3, then ipv4.routing-rule2, then routing-rule1.~~ Actually, I think that isn't a problem, because we'll always walk through them in an ordered way (for loop, starting at 1 and increasing the "X").

But that seems a little fragile and confusing to me to mix the two configuration forms like this. Is there actually any value in writing the routing-policy rules if they'll be overridden or supplemented by the routing-ruleX rules? I don't have enough experience with netplan to feel strongly about this either way. Or should we just take an all-or-none approach and refrain from emitting any routing-policy entries if we encounter a routing-ruleX that we cannot convert?

I agree, it's confusing. As stated above, the value would be for consumers to query the Netplan config state in a programmatic way (e.g. via netplan get); but that's only the case if that really reflects the actual configuration (i.e. when it doesn't get overwritten by passthrough settings). The "passthrough" section is supposed to only be a stop-gap solution, covering the unsupported cases. Eventually, we don't want to have anything left-over in there.

So, how can we solve this? What about this approach, all implemented inside a single, new parse-nm.c:parse_ip_rules() function :

We parse any routing-ruleX that we can (i.e. currently supported by Netplan schema) into native routing-policy, and clear them from the keyfile
We skip over unsupported routing-ruleXs, and keep the in the keyfile
We count the native routing-policy rules that we were able to parse; let's call that "COUNT"
After the initial loop, we run a 2nd loop over the routing-ruleX remaining in the keyfile and re-write them as routing-ruleCOUNT+1, routing-ruleCOUNT+2, ...

This should solve the duplicated entries in Netplan YAML issue and the issue of "routing-policy" entries not being authorative.
Re-using your example from above, that should leave us with something like this:

    routing-policy:
    - table: 254
      priority: 99
      to: "1.2.3.4"
    - table: 252
      priority: 97
      to: "5.6.7.8"
    networkmanager:
      passthrough:
        ipv4.routing-rule3: priority 98 iif eth1 dport 1024-2048 table 253

We still have one problem, though: What happens if somebody manually modifies the Netplan YAML config, adding a 3rd routing-policy rule? – It would be in conflict with "routing-rule3" from passthrough. It's an edge case, but we should still handle it nicely and make users aware of it.

To mitigate that, we could modify nm.c:write_fallback_key_value() to log a warning (~~or even bail out??~~) when an existing routing-ruleX keyfile setting already exists. I think we're doing something along those lines already, only printing a g_debug message about "NetworkManager: fallback override: ..." – maybe that should be warning, especially for "routing-ruleXs".

@drafnel WDYT?

drafnel · 2024-11-18T19:11:27Z

note (non-blocking): Looking into parse-nm.c:parse_routes(), I realize we have the very same issue of potentially conflicting routeX[_options] from networkmanager.passthrough and Netplan native routes there as well. Once we come up with a good solution for routing-policy, we maybe should apply that to routes, too.

Yeah, probably.

So, how can we solve this? What about this approach, all implemented inside a single, new parse-nm.c:parse_ip_rules() function :

* We parse any `routing-ruleX` that we can (i.e. currently supported by Netplan schema) into native `routing-policy`, and clear them from the keyfile

* We skip over unsupported `routing-ruleXs`, and keep the in the keyfile

* We count the native routing-policy rules that we were able to parse; let's call that "COUNT"

* After the initial loop, we run a 2nd loop over the `routing-ruleX` remaining in the keyfile and re-write them as `routing-ruleCOUNT+1`, `routing-ruleCOUNT+2`, ...

Yeah, I think we have to bite the bullet and go ahead and rewrite the rules.

We still have one problem, though: What happens if somebody manually modifies the Netplan YAML config, adding a 3rd routing-policy rule?

Or if a rule is deleted, leaving a gap between the numbering of the routing-ruleXs that will be generated in the NetworkManager config from the routing-policy rules and the numbering of the passthrough rules.

– It would be in conflict with "routing-rule3" from passthrough. It's an edge case, but we should still handle it nicely and make users aware of it.

To mitigate that, we could modify nm.c:write_fallback_key_value() to log a warning (~~or even bail out??~~) when an existing routing-ruleX keyfile setting already exists. I think we're doing something along those lines already, only printing a g_debug message about "NetworkManager: fallback override: ..." – maybe that should be warning, especially for "routing-ruleXs".

A warning, at a minimum, seems appropriate.

drafnel force-pushed the nm-support-routing-policy branch 3 times, most recently from f7e1f21 to cbc9908 Compare November 3, 2024 17:19

slyon self-requested a review November 6, 2024 14:27

slyon changed the title ~~networkmanager: add support for "routing-policy"~~ networkmanager: add support for "routing-policy" (LP: #2086544) Nov 6, 2024

slyon reviewed Nov 6, 2024

View reviewed changes

src/nm.c Outdated Show resolved Hide resolved

src/nm.c Show resolved Hide resolved

src/nm.c Outdated Show resolved Hide resolved

tests/integration/routing.py Show resolved Hide resolved

src/parse.c Outdated Show resolved Hide resolved

drafnel force-pushed the nm-support-routing-policy branch from cbc9908 to cc5d82d Compare November 13, 2024 04:20

slyon approved these changes Nov 13, 2024

View reviewed changes

drafnel added 3 commits November 14, 2024 11:44

validation: remove superfluous comparison

e9eddce

Since the check of whether r->table == nd->vrf_table has already failed in the initial 'if' conditional, r->table is necessarily != nd->vrf_table. It is not necessary to explicitly check for this state in the 'else if' conditional. Let's not.

slyon approved these changes Nov 14, 2024

View reviewed changes

slyon force-pushed the nm-support-routing-policy branch from bb0f8a7 to e0c8801 Compare November 14, 2024 11:00

slyon merged commit ae09e2c into canonical:main Nov 14, 2024
17 checks passed

slyon reviewed Nov 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

networkmanager: add support for "routing-policy" (LP: #2086544) #526

networkmanager: add support for "routing-policy" (LP: #2086544) #526

drafnel commented Nov 2, 2024 •

edited

Loading

drafnel commented Nov 4, 2024

slyon commented Nov 6, 2024

slyon left a comment •

edited

Loading

drafnel commented Nov 6, 2024

slyon commented Nov 12, 2024

slyon left a comment

drafnel commented Nov 13, 2024 via email

drafnel commented Nov 14, 2024

slyon left a comment •

edited

Loading

slyon Nov 14, 2024

slyon commented Nov 14, 2024

drafnel commented Nov 18, 2024

networkmanager: add support for "routing-policy" (LP: #2086544) #526

networkmanager: add support for "routing-policy" (LP: #2086544) #526

Conversation

drafnel commented Nov 2, 2024 • edited Loading

Description

Checklist

drafnel commented Nov 4, 2024

slyon commented Nov 6, 2024

slyon left a comment • edited Loading

Choose a reason for hiding this comment

drafnel commented Nov 6, 2024

slyon commented Nov 12, 2024

slyon left a comment

Choose a reason for hiding this comment

drafnel commented Nov 13, 2024 via email

drafnel commented Nov 14, 2024

slyon left a comment • edited Loading

Choose a reason for hiding this comment

slyon Nov 14, 2024

Choose a reason for hiding this comment

slyon commented Nov 14, 2024

drafnel commented Nov 18, 2024

drafnel commented Nov 2, 2024 •

edited

Loading

slyon left a comment •

edited

Loading

slyon left a comment •

edited

Loading