Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Set keep-configuration=no by default for Network Manager #519

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

daniloegea
Copy link
Collaborator

@daniloegea daniloegea commented Sep 17, 2024

Description

Network Manager will not touch external interfaces by default but might create a temporary in-memory connection for them. In some situations, in particular when there are virtual interfaces present in the YAML, netplan apply will lead to persistent connections to be inactive and the creation of temporary ones. It happens because netplan apply will restart the daemon and delete all of its state from disk. It's not clear to me why it fails to match the existing interfaces to connection profiles though.

One workaround consists in settings the global configuration keep-configuration=no. With this setting, NM will look for the best profile that can manage a given interface. With this in place, netplan apply seems to always produce the correct results when Network Manager is the renderer.

NOTE - SIDE EFFECT: the lo temporary connection will not be created anymore with keep-configuration=no.

NOTE 2: I'm not sure where keep-configuration=no can break things for us.

Arguably, not creating external connections for interfaces that are not managed by Netplan is, I think, the behavior we want.

Alternative solution: forcing netplan apply to delete virtual interfaces and let Network Manager recreate them also works. It's arguably safer than setting keep-configuration=no as it's not clear what it could break.
UPDATE: the problem seems to happen even if the virtual interfaces are deleted (netplan was already deleting them via nmcli device disconnect).

I created a PPA for Oracular with this patch here https://launchpad.net/~danilogondolfo/+archive/ubuntu/netplan.io
The PPA also contains #518

Reproducer: use the configuration below and run netplan apply a few times and observe that some of the connections created by Netplan will not be activated and there will be external temporary connections for some interfaces. networkd is mixed in to show that keep-configuration=no will not interfere with it.

network:
  version: 2
  renderer: NetworkManager
  ethernets:
    enp5s0:
      dhcp4: true
  bridges:
    br0:
      addresses:
      - "192.168.5.1/24"
      dhcp4: false
      dhcp6: false
      interfaces:
      - veth6
      - veth4
      - veth2
      - veth0
    br1:
      addresses:
      - "192.168.6.1/24"
      dhcp4: false
      dhcp6: false
      interfaces:
      - veth3
      - veth1
      - veth7
      - veth5
    br123:
      renderer: networkd
      interfaces:
      - veth123
    br321:
      renderer: networkd
      interfaces:
      - veth321
  bonds:
    bond0:
      addresses:
      - "192.168.0.1/24"
      dhcp4: false
      dhcp6: false
    bond1:
      addresses:
      - "192.168.1.1/24"
      dhcp4: false
      dhcp6: false
    bond2:
      addresses:
      - "192.168.2.1/24"
      dhcp4: false
      dhcp6: false
    bond3:
      addresses:
      - "192.168.3.1/24"
      dhcp4: false
      dhcp6: false
    bond4:
      addresses:
      - "192.168.4.1/24"
      dhcp4: false
      dhcp6: false
    bond123:
      renderer: networkd
      interfaces:
      - dummy123
    bond321:
      renderer: networkd
      interfaces:
      - dummy321
  dummy-devices:
    dummy0:
      dhcp4: false
      dhcp6: false
    dummy1:
      dhcp4: false
      dhcp6: false
    dummy2:
      dhcp4: false
      dhcp6: false
    dummy3:
      dhcp4: false
      dhcp6: false
    dummy123:
      renderer: networkd
    dummy321:
      renderer: networkd
  virtual-ethernets:
    veth0:
      peer: "veth1"
    veth1:
      peer: "veth0"
    veth2:
      peer: "veth3"
    veth3:
      peer: "veth2"
    veth4:
      peer: "veth5"
    veth5:
      peer: "veth4"
    veth6:
      peer: "veth7"
    veth7:
      peer: "veth6"
    veth123:
      renderer: networkd
      peer: "veth321"
    veth321:
      renderer: networkd
      peer: "veth123"

Checklist

  • Runs make check successfully.
  • Retains code coverage (make check-coverage).
  • New/changed keys in YAML format are documented.
  • (Optional) Adds example YAML for new feature.
  • (Optional) Closes an open bug in Launchpad.

Network Manager will create external connection profiles for existing
interfaces. There are cases where it will happen, in particular with
virtual interfaces, during "netplan apply" even for interfaces that have
profiles.

"keep-configuration" forces NM to use the most appropriate profile for
existing interfaces.
With the introduction of keep-configuration it shouldn't be necessary
anymore.
@slyon slyon added the RFC Request for comment (don't merge yet) label Sep 17, 2024
Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first impression was that this could be related to the no-auto-default setting, creating new connection profiles in-memory (docs), but that doesn't seem to be the case here.

I was pondering if we could be using allowed-connections instead (docs), to disallow only external connections, which have a corresponding netplan-* connection profile, but couldn't find an obvious way to do that with the Connection List Format.

So I think in general using keep-configuration is a reasonable workaround, although it might introduce a slight change of behaviour, which we need to be cautious about. I cannot currently see scenario where this could impact our users, but you never know..

It still feels like a workaround, though, and I wonder if we can get closer to fixing the root cause. The docs for keep-configuration states:

On startup, NetworkManager tries to not interfere with interfaces that are already configured. It does so by generating a in-memory connection based on the interface current configuration.

If this generated connection matches one of the existing persistent connections, the persistent connection gets activated. If there is no match, the generated connection gets activated as "external", which means that the connection is considered as active, but NetworkManager doesn't actually touch the interface.

It is possible to disable this behavior by setting keep-configuration to no. In this way, on startup NetworkManager always tries to activate the most suitable persistent connection (the one with highest autoconnect-priority or, in case of a tie, the one activated most recently).

Note that when NetworkManager gets restarted, it stores the previous state in /run/NetworkManager; in particular it saves the UUID of the connection that was previously active so that it can be activated again after the restart. Therefore, keep-configuration does not have any effect on service restart.

So there must be something in those interfaces that is detected as a previous configuration which does not match the netplan-* profile. And I think the flushing of existing IPs that you dropped from apply.py tried to address of this (partially?).

The hint about /run/NetworkManager is interesting... especially the data in /run/NetworkManager/devices/ (previous UUID) might be of interest. Especially as Netplan is fiddling with this NM state already.

I created two diffs of mismatched connection profilles for veth3 and br1:

diff --git a/veth3 b/npveth3
index 1117bb3..f4d14d2 100644
--- a/veth3
+++ b/npveth3
@@ -1,18 +1,18 @@
-connection.id:                          veth3
-connection.uuid:                        f64bf0c2-dd5b-4f84-a801-6f2cf24b37f8
+connection.id:                          netplan-veth3
+connection.uuid:                        59d8ceab-214b-39bf-b98b-d5a36750730c
 connection.stable-id:                   --
-connection.type:                        802-3-ethernet
+connection.type:                        veth
 connection.interface-name:              veth3
-connection.autoconnect:                 no
+connection.autoconnect:                 yes
 connection.autoconnect-priority:        0
 connection.autoconnect-retries:         -1 (default)
 connection.multi-connect:               0 (default)
 connection.auth-retries:                -1
-connection.timestamp:                   1727256945
+connection.timestamp:                   1727256665
 connection.permissions:                 --
 connection.zone:                        --
-connection.controller:                  fea441cc-b5ff-46f1-80f8-c8534b87e73a
-connection.master:                      fea441cc-b5ff-46f1-80f8-c8534b87e73a
+connection.controller:                  br1
+connection.master:                      br1
 connection.slave-type:                  bridge
 connection.port-type:                   bridge
 connection.autoconnect-slaves:          -1 (default)
@@ -28,37 +28,8 @@ connection.dns-over-tls:                -1 (default)
 connection.mptcp-flags:                 0x0 (default)
 connection.wait-device-timeout:         -1
 connection.wait-activation-delay:       -1
-802-3-ethernet.port:                    --
-802-3-ethernet.speed:                   0
-802-3-ethernet.duplex:                  --
-802-3-ethernet.auto-negotiate:          no
-802-3-ethernet.mac-address:             E2:11:67:BF:06:19
-802-3-ethernet.cloned-mac-address:      --
-802-3-ethernet.generate-mac-address-mask:--
-802-3-ethernet.mac-address-denylist:    --
-802-3-ethernet.mtu:                     auto
-802-3-ethernet.s390-subchannels:        --
-802-3-ethernet.s390-nettype:            --
-802-3-ethernet.s390-options:            --
-802-3-ethernet.wake-on-lan:             default
-802-3-ethernet.wake-on-lan-password:    --
-802-3-ethernet.accept-all-mac-addresses:-1 (default)
+veth.peer:                              veth2
 bridge-port.priority:                   32
 bridge-port.path-cost:                  100
 bridge-port.hairpin-mode:               no
 bridge-port.vlans:                      --
-GENERAL.NAME:                           veth3
-GENERAL.UUID:                           f64bf0c2-dd5b-4f84-a801-6f2cf24b37f8
-GENERAL.DEVICES:                        veth3
-GENERAL.IP-IFACE:                       veth3
-GENERAL.STATE:                          activated
-GENERAL.DEFAULT:                        no
-GENERAL.DEFAULT6:                       no
-GENERAL.SPEC-OBJECT:                    --
-GENERAL.VPN:                            no
-GENERAL.DBUS-PATH:                      /org/freedesktop/NetworkManager/ActiveConnection/3
-GENERAL.CON-PATH:                       /org/freedesktop/NetworkManager/Settings/26
-GENERAL.ZONE:                           --
-GENERAL.MASTER-PATH:                    /org/freedesktop/NetworkManager/Devices/5
-IP4.GATEWAY:                            --
-IP6.GATEWAY:                            --
diff --git a/br1 b/npbr1
index 51b32ce..71436b3 100644
--- a/br1
+++ b/npbr1
@@ -1,14 +1,14 @@
-connection.id:                          br1
-connection.uuid:                        fea441cc-b5ff-46f1-80f8-c8534b87e73a
+connection.id:                          netplan-br1
+connection.uuid:                        2f1c114b-0956-3743-a28b-800770c11963
 connection.stable-id:                   --
 connection.type:                        bridge
 connection.interface-name:              br1
-connection.autoconnect:                 no
+connection.autoconnect:                 yes
 connection.autoconnect-priority:        0
 connection.autoconnect-retries:         -1 (default)
 connection.multi-connect:               0 (default)
 connection.auth-retries:                -1
-connection.timestamp:                   1727256945
+connection.timestamp:                   1727256665
 connection.permissions:                 --
 connection.zone:                        --
 connection.controller:                  --
@@ -43,12 +43,12 @@ connection.wait-activation-delay:       -1
 802-3-ethernet.wake-on-lan:             default
 802-3-ethernet.wake-on-lan-password:    --
 802-3-ethernet.accept-all-mac-addresses:-1 (default)
-ipv4.method:                            disabled
+ipv4.method:                            manual
 ipv4.dns:                               --
 ipv4.dns-search:                        --
 ipv4.dns-options:                       --
 ipv4.dns-priority:                      0
-ipv4.addresses:                         --
+ipv4.addresses:                         192.168.6.1/24
 ipv4.gateway:                           --
 ipv4.routes:                            --
 ipv4.route-metric:                      -1
@@ -95,7 +95,7 @@ ipv6.required-timeout:                  -1 (default)
 ipv6.ip6-privacy:                       -1 (default)
 ipv6.temp-valid-lifetime:               0 (default)
 ipv6.temp-preferred-lifetime:           0 (default)
-ipv6.addr-gen-mode:                     default
+ipv6.addr-gen-mode:                     default-or-eui64
 ipv6.ra-timeout:                        0 (default)
 ipv6.mtu:                               auto
 ipv6.dhcp-pd-hint:                      --
@@ -113,7 +113,7 @@ bridge.priority:                        32768
 bridge.forward-delay:                   15
 bridge.hello-time:                      2
 bridge.max-age:                         20
-bridge.ageing-time:                     30
+bridge.ageing-time:                     300
 bridge.group-forward-mask:              0
 bridge.multicast-snooping:              yes
 bridge.vlan-filtering:                  no
@@ -123,18 +123,3 @@ proxy.method:                           none
 proxy.browser-only:                     no
 proxy.pac-url:                          --
 proxy.pac-script:                       --
-GENERAL.NAME:                           br1
-GENERAL.UUID:                           fea441cc-b5ff-46f1-80f8-c8534b87e73a
-GENERAL.DEVICES:                        br1
-GENERAL.IP-IFACE:                       br1
-GENERAL.STATE:                          activated
-GENERAL.DEFAULT:                        no
-GENERAL.DEFAULT6:                       no
-GENERAL.SPEC-OBJECT:                    --
-GENERAL.VPN:                            no
-GENERAL.DBUS-PATH:                      /org/freedesktop/NetworkManager/ActiveConnection/2
-GENERAL.CON-PATH:                       /org/freedesktop/NetworkManager/Settings/25
-GENERAL.ZONE:                           --
-GENERAL.MASTER-PATH:                    --
-IP4.GATEWAY:                            --
-IP6.GATEWAY:                            --

Can you maybe have a close look at those diffs, e.g. check if some of the Netplan defaults cannot be matched, or some state in /run/NetworkManager is getting in our way here?

Edit: Also, we need to consider cases, which only have the following Netplan configuration, instructing NetworkManager to take over:

network:
  version: 2
  renderer: NetworkManager

Comment on lines +317 to +318
_netplan_g_string_free_to_file(g_string_new("[device]\nkeep-configuration=no\n"), rootdir,
"/run/NetworkManager/conf.d/10-keep-configuration.conf", NULL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: should this be included in /run/NetworkManager/conf.d/netplan.con instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe rather ship is as a configuration file from the Netplan package, instead of generating it?

Comment on lines -296 to -297
for iface in nm_interfaces:
utils.ip_addr_flush(iface)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: We need to confirm that this doesn't regress https://bugs.launchpad.net/netplan/+bug/1870561

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comment (don't merge yet)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants