Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linstor looking for next version of DRBD on evacuate #626

Closed
Ulrar opened this issue Mar 12, 2024 · 12 comments
Closed

Linstor looking for next version of DRBD on evacuate #626

Ulrar opened this issue Mar 12, 2024 · 12 comments

Comments

@Ulrar
Copy link

Ulrar commented Mar 12, 2024

Hi,

Just installed a new node, and when I tried evacuating an existing one I got this :

ERROR:
Description:
    Node: 'talos-if5-jn6' has DRBD version 9.2.6, but version 9.2.7 (or higher) is required
Details:
    Node(s): 'talos-if5-jn6', Resource: 'pvc-1a9e5a5e-fdba-4b8e-ae9f-1a7acd048184'
Show reports:
    linstor error-reports show 65EA25CA-00000-000000

The node still got marked as evacuating, but the new node didn't get any volumes. All of my nodes are using 9.2.6 since that's the only available version for Talos, so I don't know where it's getting 9.2.7 from.
Any idea, is there a config somewhere I may have missed ?

Thanks

@WanzenBug
Copy link
Member

Looks to be related to storage pool mixing: LINBIT/linstor-server@700cb62

So do you have a different type of storage pool on the new node?

@Ulrar
Copy link
Author

Ulrar commented Mar 13, 2024

I do not, the new node is a fresh install of Talos, the only thing it did is join the cluster (they're all control plane nodes, it's a small home cluster). There's nothing on it at all, just the DRBD 9.2.6 extension like the other nodes

talos-if5-jn6: user: warning: [2024-03-12T15:36:20.272953785Z]: [talos] [initramfs] enabling system extension drbd 9.2.6-v1.6.6
talos-if5-jn6: kern: warning: [2024-03-12T15:36:29.153182785Z]: drbd: loading out-of-tree module taints kernel.
talos-if5-jn6: kern:    info: [2024-03-12T15:36:29.170650785Z]: drbd: initialized. Version: 9.2.6 (api:2/proto:86-122)
talos-if5-jn6: kern:    info: [2024-03-12T15:36:29.171099785Z]: drbd: GIT-hash: 52144c0f90a0fb00df6a7d6714ec9034c7af7a28 build by @buildkitsandbox, 2024-03-06 12:26:31
talos-if5-jn6: kern:    info: [2024-03-12T15:36:29.171838785Z]: drbd: registered as block device major 147
talos-if5-jn6: kern:    info: [2024-03-12T15:36:29.178817785Z]: drbd: registered transport class 'tcp' (version:9.2.6)

Just to add details, I only have the one storage pool, with 5 volumes on it. I have been having a lot of issues with it (see #579) and I'm hoping the issue is a bad node, which I'm trying to replace

@Ulrar
Copy link
Author

Ulrar commented Mar 14, 2024

Sorry to be a pain here, but since the latest releases seem to suggest the pods are using drbd 9.2.8, is there any chance this is caused by a mismatch between the kernel module and the binary in the pods ?

One of my volumes got corrupted (because of that other issue after a node reboot, it wouldn't see it as a valid ext4 partition anymore) and I can't restore a backup because it won't let me re-create it with this same error. Kind of stuck here

@WanzenBug
Copy link
Member

No, for the user land tools (drbdadm, drbdsetup, etc...) it is all the same. Again, it may be some (uninteded) difference between node configurations. Could you share the output of linstor -m storage-pool list.

@Ulrar
Copy link
Author

Ulrar commented Mar 14, 2024

sure :

[
  {
    "stor_pools": [
      {
        "stor_pool_uuid": "32fab312-0c78-43a4-9e58-a50127faadb5",
        "stor_pool_name": "DfltDisklessStorPool",
        "node_name": "talos-00r-fu9",
        "free_space_mgr_name": "talos-00r-fu9;DfltDisklessStorPool",
        "free_space": {
          "stor_pool_name": "DfltDisklessStorPool",
          "free_capacity": 9223372036854775807,
          "total_capacity": 9223372036854775807
        },
        "driver": "DisklessDriver",
        "static_traits": [
          {
            "key": "SupportsSnapshots",
            "value": "false"
          }
        ]
      },
      {
        "stor_pool_uuid": "5df20bc5-7fed-4f83-b8ac-b54b64f012bd",
        "stor_pool_name": "DfltDisklessStorPool",
        "node_name": "talos-fdm-9ig",
        "free_space_mgr_name": "talos-fdm-9ig;DfltDisklessStorPool",
        "free_space": {
          "stor_pool_name": "DfltDisklessStorPool",
          "free_capacity": 9223372036854775807,
          "total_capacity": 9223372036854775807
        },
        "driver": "DisklessDriver",
        "static_traits": [
          {
            "key": "SupportsSnapshots",
            "value": "false"
          }
        ]
      },
      {
        "stor_pool_uuid": "ef6c433f-cd01-453d-a667-30f219ba93ac",
        "stor_pool_name": "DfltDisklessStorPool",
        "node_name": "talos-if5-jn6",
        "free_space_mgr_name": "talos-if5-jn6;DfltDisklessStorPool",
        "free_space": {
          "stor_pool_name": "DfltDisklessStorPool",
          "free_capacity": 9223372036854775807,
          "total_capacity": 9223372036854775807
        },
        "driver": "DisklessDriver",
        "static_traits": [
          {
            "key": "SupportsSnapshots",
            "value": "false"
          }
        ]
      },
      {
        "stor_pool_uuid": "5091a678-d215-4df8-b694-db4b747a01af",
        "stor_pool_name": "DfltDisklessStorPool",
        "node_name": "talos-ozt-z3h",
        "free_space_mgr_name": "talos-ozt-z3h;DfltDisklessStorPool",
        "free_space": {
          "stor_pool_name": "DfltDisklessStorPool",
          "free_capacity": 9223372036854775807,
          "total_capacity": 9223372036854775807
        },
        "driver": "DisklessDriver",
        "static_traits": [
          {
            "key": "SupportsSnapshots",
            "value": "false"
          }
        ]
      },
      {
        "stor_pool_uuid": "405cef60-8481-4256-846c-366917ea019c",
        "stor_pool_name": "main-pool",
        "node_name": "talos-00r-fu9",
        "free_space_mgr_name": "talos-00r-fu9;main-pool",
        "free_space": {
          "stor_pool_name": "main-pool",
          "free_capacity": 212224748,
          "total_capacity": 248705384
        },
        "driver": "",
        "static_traits": [
          {
            "key": "Provisioning",
            "value": "Thin"
          },
          {
            "key": "SupportsSnapshots",
            "value": "true"
          }
        ],
        "props": [
          {
            "key": "Aux/piraeus.io/managed-by",
            "value": "piraeus-operator"
          },
          {
            "key": "Aux/piraeus.io/last-applied",
            "value": "[\"Aux/piraeus.io/managed-by\",\"StorDriver/StorPoolName\"]"
          },
          {
            "key": "StorDriver/StorPoolName",
            "value": "/var/lib/piraeus-datastore/main-pool"
          }
        ]
      },
      {
        "stor_pool_uuid": "68c2c5f0-9d07-4daa-85a0-a3858f456fa9",
        "stor_pool_name": "main-pool",
        "node_name": "talos-fdm-9ig",
        "free_space_mgr_name": "talos-fdm-9ig;main-pool",
        "free_space": {
          "stor_pool_name": "main-pool",
          "free_capacity": 207370408,
          "total_capacity": 248705384
        },
        "driver": "",
        "static_traits": [
          {
            "key": "Provisioning",
            "value": "Thin"
          },
          {
            "key": "SupportsSnapshots",
            "value": "true"
          }
        ],
        "props": [
          {
            "key": "Aux/piraeus.io/managed-by",
            "value": "piraeus-operator"
          },
          {
            "key": "Aux/piraeus.io/last-applied",
            "value": "[\"Aux/piraeus.io/managed-by\",\"StorDriver/StorPoolName\"]"
          },
          {
            "key": "StorDriver/StorPoolName",
            "value": "/var/lib/piraeus-datastore/main-pool"
          }
        ]
      },
      {
        "stor_pool_uuid": "cd300917-84ac-4f49-8067-9bc7898ee1f4",
        "stor_pool_name": "main-pool",
        "node_name": "talos-if5-jn6",
        "free_space_mgr_name": "talos-if5-jn6;main-pool",
        "free_space": {
          "stor_pool_name": "main-pool",
          "free_capacity": 112252248,
          "total_capacity": 123737088
        },
        "driver": "",
        "static_traits": [
          {
            "key": "Provisioning",
            "value": "Thin"
          },
          {
            "key": "SupportsSnapshots",
            "value": "true"
          }
        ],
        "props": [
          {
            "key": "Aux/piraeus.io/managed-by",
            "value": "piraeus-operator"
          },
          {
            "key": "Aux/piraeus.io/last-applied",
            "value": "[\"Aux/piraeus.io/managed-by\",\"StorDriver/StorPoolName\"]"
          },
          {
            "key": "StorDriver/StorPoolName",
            "value": "/var/lib/piraeus-datastore/main-pool"
          },
          {
            "key": "StorDriver/internal/AllocationGranularity",
            "value": "1"
          }
        ]
      },
      {
        "stor_pool_uuid": "3273dc0a-72f2-4706-8588-b0986da0bd52",
        "stor_pool_name": "main-pool",
        "node_name": "talos-ozt-z3h",
        "free_space_mgr_name": "talos-ozt-z3h;main-pool",
        "free_space": {
          "stor_pool_name": "main-pool",
          "free_capacity": 82570412,
          "total_capacity": 115922944
        },
        "driver": "",
        "static_traits": [
          {
            "key": "Provisioning",
            "value": "Thin"
          },
          {
            "key": "SupportsSnapshots",
            "value": "true"
          }
        ],
        "props": [
          {
            "key": "Aux/piraeus.io/managed-by",
            "value": "piraeus-operator"
          },
          {
            "key": "Aux/piraeus.io/last-applied",
            "value": "[\"Aux/piraeus.io/managed-by\",\"StorDriver/StorPoolName\"]"
          },
          {
            "key": "StorDriver/StorPoolName",
            "value": "/var/lib/piraeus-datastore/main-pool"
          }
        ]
      }
    ]
  }
]

@Ulrar
Copy link
Author

Ulrar commented Mar 14, 2024

I don't know if that's it, but the only difference I see is the new node seems to have StorDriver/internal/AllocationGranularity set to 1 somehow.
I can't seem to unset it either : The key 'StorDriver/internal/AllocationGranularity' is not whitelisted.

@WanzenBug
Copy link
Member

That is probably why LINSTOR thinks it must involve the storage pool mixing. I'll do a bit of digging on when this property is added.

@Ulrar
Copy link
Author

Ulrar commented Mar 14, 2024

Awesome, thank you very much.

For now I figured out that by marking that new node as evacuating, I could force my volume to be created on the other ones so I was able to restore my backup, all good for now no rush. Appreciated the help

@WanzenBug
Copy link
Member

Ok, it seems related to the "old" storage pools being created in a previous linstor version, with the new storage pool being created with LINSTOR >=1.26.

I guess there is some missing migration in LINSTOR that then causes LINSTOR to see different values for the granularity, so it runs into the storage pool mixing case.

As a workaround, here is a script that creates the property in the LINSTOR database:

#!/bin/sh
set -e
NODE="$(echo "$1" | tr a-z A-Z)"
POOL="$(echo "$2" | tr a-z A-Z)"

KEY="$(echo -n "/STORPOOLCONF/$NODE/$POOL:StorDriver/internal/AllocationGranularity" | sha256sum | cut -d " " -f 1)"

cat <<EOF
apiVersion: internal.linstor.linbit.com/v1-25-1
kind: PropsContainers
metadata:
  name: $KEY
spec:
  prop_key: StorDriver/internal/AllocationGranularity
  prop_value: "1"
  props_instance: /STORPOOLCONF/$NODE/$POOL
EOF

You can run it and apply the created resource, then restart the linstor controller.

bash script.sh talos-ozt-z3h main-pool | kubectl create -f -

Afterwards, evacuation will work also on between old and new nodes.

@Ulrar
Copy link
Author

Ulrar commented Mar 14, 2024

That did seem to work, thank you very much !

@reissmann
Copy link

reissmann commented Jun 17, 2024

I think, I ran into the same issue on a 3-node-proxmox cluster.

One node (pve1) was evacuated and upgraded, while the other two are still using an older proxmox/linstor/drbd-module version. When I try to evacuate another node to update it, I get:

Node: 'pve3' has DRBD version 9.2.2, but version 9.2.7 (or higher) is required

I see "StorDriver/internal/AllocationGranularity": "16" on one node and "StorDriver/internal/AllocationGranularity": "8" on the other two nodes.

I don't really understand what your script (@WanzenBug) does - it is Kubernetes specific, I'd guess?

Any hints on how to solve this?

pve1 ~ # linstor -m storage-pool list
[
  [
    {
      "storage_pool_name": "DfltDisklessStorPool",
      "node_name": "pve1",
      "provider_kind": "DISKLESS",
      "static_traits": {
        "SupportsSnapshots": "false"
      },
      "free_capacity": 9223372036854775807,
      "total_capacity": 9223372036854775807,
      "free_space_mgr_name": "pve1;DfltDisklessStorPool",
      "uuid": "0d9b1f16-c3a2-499c-a23e-fc20e6b157e0",
      "supports_snapshots": false,
      "external_locking": false
    },
    {
      "storage_pool_name": "DfltDisklessStorPool",
      "node_name": "pve2",
      "provider_kind": "DISKLESS",
      "static_traits": {
        "SupportsSnapshots": "false"
      },
      "free_capacity": 9223372036854775807,
      "total_capacity": 9223372036854775807,
      "free_space_mgr_name": "pve2;DfltDisklessStorPool",
      "uuid": "dcd4d766-9b50-4d86-85b7-4714aa967196",
      "supports_snapshots": false,
      "external_locking": false
    },
    {
      "storage_pool_name": "DfltDisklessStorPool",
      "node_name": "pve3",
      "provider_kind": "DISKLESS",
      "static_traits": {
        "SupportsSnapshots": "false"
      },
      "free_capacity": 9223372036854775807,
      "total_capacity": 9223372036854775807,
      "free_space_mgr_name": "pve3;DfltDisklessStorPool",
      "uuid": "d52a9bb1-f562-4dec-b326-21acf54c194d",
      "supports_snapshots": false,
      "external_locking": false
    },
    {
      "storage_pool_name": "drbd_disk",
      "node_name": "pve1",
      "provider_kind": "ZFS",
      "props": {
        "StorDriver/StorPoolName": "zpool_disk_drbd",
        "StorDriver/internal/AllocationGranularity": "16"
      },
      "static_traits": {
        "Provisioning": "Fat",
        "SupportsSnapshots": "true"
      },
      "free_capacity": 7344034195,
      "total_capacity": 9361686528,
      "free_space_mgr_name": "pve1;drbd_disk",
      "uuid": "c448a177-5185-44fb-89ff-e81ade460277",
      "supports_snapshots": true,
      "external_locking": false
    },
    {
      "storage_pool_name": "drbd_disk",
      "node_name": "pve2",
      "provider_kind": "ZFS",
      "props": {
        "StorDriver/StorPoolName": "zpool_disk_drbd",
        "StorDriver/internal/AllocationGranularity": "8"
      },
      "static_traits": {
        "Provisioning": "Fat",
        "SupportsSnapshots": "true"
      },
      "free_capacity": 4304483389,
      "total_capacity": 9361686528,
      "free_space_mgr_name": "pve2;drbd_disk",
      "uuid": "cb73c1f4-baae-4a59-90c1-9cfa8ff9a934",
      "supports_snapshots": true,
      "external_locking": false
    },
    {
      "storage_pool_name": "drbd_disk",
      "node_name": "pve3",
      "provider_kind": "ZFS",
      "props": {
        "StorDriver/StorPoolName": "zpool_disk_drbd",
        "StorDriver/internal/AllocationGranularity": "8"
      },
      "static_traits": {
        "Provisioning": "Fat",
        "SupportsSnapshots": "true"
      },
      "free_capacity": 4304441461,
      "total_capacity": 9361686528,
      "free_space_mgr_name": "pve3;drbd_disk",
      "uuid": "f0805868-e47d-43dc-a44c-9e9ed3460df2",
      "supports_snapshots": true,
      "external_locking": false
    }
  ]
]

@WanzenBug
Copy link
Member

Your issue is a bit different. Since you upgraded one node, I assume you also have a newer of ZFS. Linstor tries to get the default block size (LINBIT/linstor-server@72ddcb483e).

On newer ZFS versions (> 2.2.0) this block size was changed to 16k instead of 8k: openzfs/zfs@72f0521

So there really is a mismatch between these sizes, and because of known bugs in older DRBD versions LINSTOR will refuse to "mix" those storage pools. You may want to ask on the LINBIT forums how to proceed: https://forums.linbit.com/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants