Linstor looking for next version of DRBD on evacuate #626

Ulrar · 2024-03-12T15:44:08Z

Hi,

Just installed a new node, and when I tried evacuating an existing one I got this :

ERROR:
Description:
    Node: 'talos-if5-jn6' has DRBD version 9.2.6, but version 9.2.7 (or higher) is required
Details:
    Node(s): 'talos-if5-jn6', Resource: 'pvc-1a9e5a5e-fdba-4b8e-ae9f-1a7acd048184'
Show reports:
    linstor error-reports show 65EA25CA-00000-000000

The node still got marked as evacuating, but the new node didn't get any volumes. All of my nodes are using 9.2.6 since that's the only available version for Talos, so I don't know where it's getting 9.2.7 from.
Any idea, is there a config somewhere I may have missed ?

Thanks

The text was updated successfully, but these errors were encountered:

WanzenBug · 2024-03-13T08:55:47Z

Looks to be related to storage pool mixing: LINBIT/linstor-server@700cb62

So do you have a different type of storage pool on the new node?

Ulrar · 2024-03-13T09:11:44Z

I do not, the new node is a fresh install of Talos, the only thing it did is join the cluster (they're all control plane nodes, it's a small home cluster). There's nothing on it at all, just the DRBD 9.2.6 extension like the other nodes

talos-if5-jn6: user: warning: [2024-03-12T15:36:20.272953785Z]: [talos] [initramfs] enabling system extension drbd 9.2.6-v1.6.6
talos-if5-jn6: kern: warning: [2024-03-12T15:36:29.153182785Z]: drbd: loading out-of-tree module taints kernel.
talos-if5-jn6: kern:    info: [2024-03-12T15:36:29.170650785Z]: drbd: initialized. Version: 9.2.6 (api:2/proto:86-122)
talos-if5-jn6: kern:    info: [2024-03-12T15:36:29.171099785Z]: drbd: GIT-hash: 52144c0f90a0fb00df6a7d6714ec9034c7af7a28 build by @buildkitsandbox, 2024-03-06 12:26:31
talos-if5-jn6: kern:    info: [2024-03-12T15:36:29.171838785Z]: drbd: registered as block device major 147
talos-if5-jn6: kern:    info: [2024-03-12T15:36:29.178817785Z]: drbd: registered transport class 'tcp' (version:9.2.6)

Just to add details, I only have the one storage pool, with 5 volumes on it. I have been having a lot of issues with it (see #579) and I'm hoping the issue is a bad node, which I'm trying to replace

Ulrar · 2024-03-14T11:35:18Z

Sorry to be a pain here, but since the latest releases seem to suggest the pods are using drbd 9.2.8, is there any chance this is caused by a mismatch between the kernel module and the binary in the pods ?

One of my volumes got corrupted (because of that other issue after a node reboot, it wouldn't see it as a valid ext4 partition anymore) and I can't restore a backup because it won't let me re-create it with this same error. Kind of stuck here

WanzenBug · 2024-03-14T11:55:34Z

No, for the user land tools (drbdadm, drbdsetup, etc...) it is all the same. Again, it may be some (uninteded) difference between node configurations. Could you share the output of linstor -m storage-pool list.

Ulrar · 2024-03-14T12:02:19Z

sure :

[
  {
    "stor_pools": [
      {
        "stor_pool_uuid": "32fab312-0c78-43a4-9e58-a50127faadb5",
        "stor_pool_name": "DfltDisklessStorPool",
        "node_name": "talos-00r-fu9",
        "free_space_mgr_name": "talos-00r-fu9;DfltDisklessStorPool",
        "free_space": {
          "stor_pool_name": "DfltDisklessStorPool",
          "free_capacity": 9223372036854775807,
          "total_capacity": 9223372036854775807
        },
        "driver": "DisklessDriver",
        "static_traits": [
          {
            "key": "SupportsSnapshots",
            "value": "false"
          }
        ]
      },
      {
        "stor_pool_uuid": "5df20bc5-7fed-4f83-b8ac-b54b64f012bd",
        "stor_pool_name": "DfltDisklessStorPool",
        "node_name": "talos-fdm-9ig",
        "free_space_mgr_name": "talos-fdm-9ig;DfltDisklessStorPool",
        "free_space": {
          "stor_pool_name": "DfltDisklessStorPool",
          "free_capacity": 9223372036854775807,
          "total_capacity": 9223372036854775807
        },
        "driver": "DisklessDriver",
        "static_traits": [
          {
            "key": "SupportsSnapshots",
            "value": "false"
          }
        ]
      },
      {
        "stor_pool_uuid": "ef6c433f-cd01-453d-a667-30f219ba93ac",
        "stor_pool_name": "DfltDisklessStorPool",
        "node_name": "talos-if5-jn6",
        "free_space_mgr_name": "talos-if5-jn6;DfltDisklessStorPool",
        "free_space": {
          "stor_pool_name": "DfltDisklessStorPool",
          "free_capacity": 9223372036854775807,
          "total_capacity": 9223372036854775807
        },
        "driver": "DisklessDriver",
        "static_traits": [
          {
            "key": "SupportsSnapshots",
            "value": "false"
          }
        ]
      },
      {
        "stor_pool_uuid": "5091a678-d215-4df8-b694-db4b747a01af",
        "stor_pool_name": "DfltDisklessStorPool",
        "node_name": "talos-ozt-z3h",
        "free_space_mgr_name": "talos-ozt-z3h;DfltDisklessStorPool",
        "free_space": {
          "stor_pool_name": "DfltDisklessStorPool",
          "free_capacity": 9223372036854775807,
          "total_capacity": 9223372036854775807
        },
        "driver": "DisklessDriver",
        "static_traits": [
          {
            "key": "SupportsSnapshots",
            "value": "false"
          }
        ]
      },
      {
        "stor_pool_uuid": "405cef60-8481-4256-846c-366917ea019c",
        "stor_pool_name": "main-pool",
        "node_name": "talos-00r-fu9",
        "free_space_mgr_name": "talos-00r-fu9;main-pool",
        "free_space": {
          "stor_pool_name": "main-pool",
          "free_capacity": 212224748,
          "total_capacity": 248705384
        },
        "driver": "",
        "static_traits": [
          {
            "key": "Provisioning",
            "value": "Thin"
          },
          {
            "key": "SupportsSnapshots",
            "value": "true"
          }
        ],
        "props": [
          {
            "key": "Aux/piraeus.io/managed-by",
            "value": "piraeus-operator"
          },
          {
            "key": "Aux/piraeus.io/last-applied",
            "value": "[\"Aux/piraeus.io/managed-by\",\"StorDriver/StorPoolName\"]"
          },
          {
            "key": "StorDriver/StorPoolName",
            "value": "/var/lib/piraeus-datastore/main-pool"
          }
        ]
      },
      {
        "stor_pool_uuid": "68c2c5f0-9d07-4daa-85a0-a3858f456fa9",
        "stor_pool_name": "main-pool",
        "node_name": "talos-fdm-9ig",
        "free_space_mgr_name": "talos-fdm-9ig;main-pool",
        "free_space": {
          "stor_pool_name": "main-pool",
          "free_capacity": 207370408,
          "total_capacity": 248705384
        },
        "driver": "",
        "static_traits": [
          {
            "key": "Provisioning",
            "value": "Thin"
          },
          {
            "key": "SupportsSnapshots",
            "value": "true"
          }
        ],
        "props": [
          {
            "key": "Aux/piraeus.io/managed-by",
            "value": "piraeus-operator"
          },
          {
            "key": "Aux/piraeus.io/last-applied",
            "value": "[\"Aux/piraeus.io/managed-by\",\"StorDriver/StorPoolName\"]"
          },
          {
            "key": "StorDriver/StorPoolName",
            "value": "/var/lib/piraeus-datastore/main-pool"
          }
        ]
      },
      {
        "stor_pool_uuid": "cd300917-84ac-4f49-8067-9bc7898ee1f4",
        "stor_pool_name": "main-pool",
        "node_name": "talos-if5-jn6",
        "free_space_mgr_name": "talos-if5-jn6;main-pool",
        "free_space": {
          "stor_pool_name": "main-pool",
          "free_capacity": 112252248,
          "total_capacity": 123737088
        },
        "driver": "",
        "static_traits": [
          {
            "key": "Provisioning",
            "value": "Thin"
          },
          {
            "key": "SupportsSnapshots",
            "value": "true"
          }
        ],
        "props": [
          {
            "key": "Aux/piraeus.io/managed-by",
            "value": "piraeus-operator"
          },
          {
            "key": "Aux/piraeus.io/last-applied",
            "value": "[\"Aux/piraeus.io/managed-by\",\"StorDriver/StorPoolName\"]"
          },
          {
            "key": "StorDriver/StorPoolName",
            "value": "/var/lib/piraeus-datastore/main-pool"
          },
          {
            "key": "StorDriver/internal/AllocationGranularity",
            "value": "1"
          }
        ]
      },
      {
        "stor_pool_uuid": "3273dc0a-72f2-4706-8588-b0986da0bd52",
        "stor_pool_name": "main-pool",
        "node_name": "talos-ozt-z3h",
        "free_space_mgr_name": "talos-ozt-z3h;main-pool",
        "free_space": {
          "stor_pool_name": "main-pool",
          "free_capacity": 82570412,
          "total_capacity": 115922944
        },
        "driver": "",
        "static_traits": [
          {
            "key": "Provisioning",
            "value": "Thin"
          },
          {
            "key": "SupportsSnapshots",
            "value": "true"
          }
        ],
        "props": [
          {
            "key": "Aux/piraeus.io/managed-by",
            "value": "piraeus-operator"
          },
          {
            "key": "Aux/piraeus.io/last-applied",
            "value": "[\"Aux/piraeus.io/managed-by\",\"StorDriver/StorPoolName\"]"
          },
          {
            "key": "StorDriver/StorPoolName",
            "value": "/var/lib/piraeus-datastore/main-pool"
          }
        ]
      }
    ]
  }
]

Ulrar · 2024-03-14T12:31:05Z

I don't know if that's it, but the only difference I see is the new node seems to have StorDriver/internal/AllocationGranularity set to 1 somehow.
I can't seem to unset it either : The key 'StorDriver/internal/AllocationGranularity' is not whitelisted.

WanzenBug · 2024-03-14T12:48:19Z

That is probably why LINSTOR thinks it must involve the storage pool mixing. I'll do a bit of digging on when this property is added.

Ulrar · 2024-03-14T13:56:29Z

Awesome, thank you very much.

For now I figured out that by marking that new node as evacuating, I could force my volume to be created on the other ones so I was able to restore my backup, all good for now no rush. Appreciated the help

WanzenBug · 2024-03-14T13:59:16Z

Ok, it seems related to the "old" storage pools being created in a previous linstor version, with the new storage pool being created with LINSTOR >=1.26.

I guess there is some missing migration in LINSTOR that then causes LINSTOR to see different values for the granularity, so it runs into the storage pool mixing case.

As a workaround, here is a script that creates the property in the LINSTOR database:

#!/bin/sh
set -e
NODE="$(echo "$1" | tr a-z A-Z)"
POOL="$(echo "$2" | tr a-z A-Z)"

KEY="$(echo -n "/STORPOOLCONF/$NODE/$POOL:StorDriver/internal/AllocationGranularity" | sha256sum | cut -d " " -f 1)"

cat <<EOF
apiVersion: internal.linstor.linbit.com/v1-25-1
kind: PropsContainers
metadata:
  name: $KEY
spec:
  prop_key: StorDriver/internal/AllocationGranularity
  prop_value: "1"
  props_instance: /STORPOOLCONF/$NODE/$POOL
EOF

You can run it and apply the created resource, then restart the linstor controller.

bash script.sh talos-ozt-z3h main-pool | kubectl create -f -

Afterwards, evacuation will work also on between old and new nodes.

Ulrar · 2024-03-14T16:23:04Z

That did seem to work, thank you very much !

reissmann · 2024-06-17T15:47:53Z

I think, I ran into the same issue on a 3-node-proxmox cluster.

One node (pve1) was evacuated and upgraded, while the other two are still using an older proxmox/linstor/drbd-module version. When I try to evacuate another node to update it, I get:

Node: 'pve3' has DRBD version 9.2.2, but version 9.2.7 (or higher) is required

I see "StorDriver/internal/AllocationGranularity": "16" on one node and "StorDriver/internal/AllocationGranularity": "8" on the other two nodes.

I don't really understand what your script (@WanzenBug) does - it is Kubernetes specific, I'd guess?

Any hints on how to solve this?

pve1 ~ # linstor -m storage-pool list
[
  [
    {
      "storage_pool_name": "DfltDisklessStorPool",
      "node_name": "pve1",
      "provider_kind": "DISKLESS",
      "static_traits": {
        "SupportsSnapshots": "false"
      },
      "free_capacity": 9223372036854775807,
      "total_capacity": 9223372036854775807,
      "free_space_mgr_name": "pve1;DfltDisklessStorPool",
      "uuid": "0d9b1f16-c3a2-499c-a23e-fc20e6b157e0",
      "supports_snapshots": false,
      "external_locking": false
    },
    {
      "storage_pool_name": "DfltDisklessStorPool",
      "node_name": "pve2",
      "provider_kind": "DISKLESS",
      "static_traits": {
        "SupportsSnapshots": "false"
      },
      "free_capacity": 9223372036854775807,
      "total_capacity": 9223372036854775807,
      "free_space_mgr_name": "pve2;DfltDisklessStorPool",
      "uuid": "dcd4d766-9b50-4d86-85b7-4714aa967196",
      "supports_snapshots": false,
      "external_locking": false
    },
    {
      "storage_pool_name": "DfltDisklessStorPool",
      "node_name": "pve3",
      "provider_kind": "DISKLESS",
      "static_traits": {
        "SupportsSnapshots": "false"
      },
      "free_capacity": 9223372036854775807,
      "total_capacity": 9223372036854775807,
      "free_space_mgr_name": "pve3;DfltDisklessStorPool",
      "uuid": "d52a9bb1-f562-4dec-b326-21acf54c194d",
      "supports_snapshots": false,
      "external_locking": false
    },
    {
      "storage_pool_name": "drbd_disk",
      "node_name": "pve1",
      "provider_kind": "ZFS",
      "props": {
        "StorDriver/StorPoolName": "zpool_disk_drbd",
        "StorDriver/internal/AllocationGranularity": "16"
      },
      "static_traits": {
        "Provisioning": "Fat",
        "SupportsSnapshots": "true"
      },
      "free_capacity": 7344034195,
      "total_capacity": 9361686528,
      "free_space_mgr_name": "pve1;drbd_disk",
      "uuid": "c448a177-5185-44fb-89ff-e81ade460277",
      "supports_snapshots": true,
      "external_locking": false
    },
    {
      "storage_pool_name": "drbd_disk",
      "node_name": "pve2",
      "provider_kind": "ZFS",
      "props": {
        "StorDriver/StorPoolName": "zpool_disk_drbd",
        "StorDriver/internal/AllocationGranularity": "8"
      },
      "static_traits": {
        "Provisioning": "Fat",
        "SupportsSnapshots": "true"
      },
      "free_capacity": 4304483389,
      "total_capacity": 9361686528,
      "free_space_mgr_name": "pve2;drbd_disk",
      "uuid": "cb73c1f4-baae-4a59-90c1-9cfa8ff9a934",
      "supports_snapshots": true,
      "external_locking": false
    },
    {
      "storage_pool_name": "drbd_disk",
      "node_name": "pve3",
      "provider_kind": "ZFS",
      "props": {
        "StorDriver/StorPoolName": "zpool_disk_drbd",
        "StorDriver/internal/AllocationGranularity": "8"
      },
      "static_traits": {
        "Provisioning": "Fat",
        "SupportsSnapshots": "true"
      },
      "free_capacity": 4304441461,
      "total_capacity": 9361686528,
      "free_space_mgr_name": "pve3;drbd_disk",
      "uuid": "f0805868-e47d-43dc-a44c-9e9ed3460df2",
      "supports_snapshots": true,
      "external_locking": false
    }
  ]
]

WanzenBug · 2024-06-18T05:48:40Z

Your issue is a bit different. Since you upgraded one node, I assume you also have a newer of ZFS. Linstor tries to get the default block size (LINBIT/linstor-server@72ddcb483e).

On newer ZFS versions (> 2.2.0) this block size was changed to 16k instead of 8k: openzfs/zfs@72f0521

So there really is a mismatch between these sizes, and because of known bugs in older DRBD versions LINSTOR will refuse to "mix" those storage pools. You may want to ask on the LINBIT forums how to proceed: https://forums.linbit.com/

Ulrar mentioned this issue Mar 14, 2024

DRBD >= 9.2.7 siderolabs/extensions#340

Closed

WanzenBug closed this as completed Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linstor looking for next version of DRBD on evacuate #626

Linstor looking for next version of DRBD on evacuate #626

Ulrar commented Mar 12, 2024

WanzenBug commented Mar 13, 2024

Ulrar commented Mar 13, 2024 •

edited

Loading

Ulrar commented Mar 14, 2024 •

edited

Loading

WanzenBug commented Mar 14, 2024

Ulrar commented Mar 14, 2024

Ulrar commented Mar 14, 2024

WanzenBug commented Mar 14, 2024

Ulrar commented Mar 14, 2024

WanzenBug commented Mar 14, 2024

Ulrar commented Mar 14, 2024

reissmann commented Jun 17, 2024 •

edited

Loading

WanzenBug commented Jun 18, 2024

Linstor looking for next version of DRBD on evacuate #626

Linstor looking for next version of DRBD on evacuate #626

Comments

Ulrar commented Mar 12, 2024

WanzenBug commented Mar 13, 2024

Ulrar commented Mar 13, 2024 • edited Loading

Ulrar commented Mar 14, 2024 • edited Loading

WanzenBug commented Mar 14, 2024

Ulrar commented Mar 14, 2024

Ulrar commented Mar 14, 2024

WanzenBug commented Mar 14, 2024

Ulrar commented Mar 14, 2024

WanzenBug commented Mar 14, 2024

Ulrar commented Mar 14, 2024

reissmann commented Jun 17, 2024 • edited Loading

WanzenBug commented Jun 18, 2024

Ulrar commented Mar 13, 2024 •

edited

Loading

Ulrar commented Mar 14, 2024 •

edited

Loading

reissmann commented Jun 17, 2024 •

edited

Loading