-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linstor looking for next version of DRBD on evacuate #626
Comments
Looks to be related to storage pool mixing: LINBIT/linstor-server@700cb62 So do you have a different type of storage pool on the new node? |
I do not, the new node is a fresh install of Talos, the only thing it did is join the cluster (they're all control plane nodes, it's a small home cluster). There's nothing on it at all, just the DRBD 9.2.6 extension like the other nodes
Just to add details, I only have the one storage pool, with 5 volumes on it. I have been having a lot of issues with it (see #579) and I'm hoping the issue is a bad node, which I'm trying to replace |
Sorry to be a pain here, but since the latest releases seem to suggest the pods are using drbd 9.2.8, is there any chance this is caused by a mismatch between the kernel module and the binary in the pods ? One of my volumes got corrupted (because of that other issue after a node reboot, it wouldn't see it as a valid ext4 partition anymore) and I can't restore a backup because it won't let me re-create it with this same error. Kind of stuck here |
No, for the user land tools ( |
sure :
|
I don't know if that's it, but the only difference I see is the new node seems to have |
That is probably why LINSTOR thinks it must involve the storage pool mixing. I'll do a bit of digging on when this property is added. |
Awesome, thank you very much. For now I figured out that by marking that new node as evacuating, I could force my volume to be created on the other ones so I was able to restore my backup, all good for now no rush. Appreciated the help |
Ok, it seems related to the "old" storage pools being created in a previous linstor version, with the new storage pool being created with LINSTOR >=1.26. I guess there is some missing migration in LINSTOR that then causes LINSTOR to see different values for the granularity, so it runs into the storage pool mixing case. As a workaround, here is a script that creates the property in the LINSTOR database: #!/bin/sh
set -e
NODE="$(echo "$1" | tr a-z A-Z)"
POOL="$(echo "$2" | tr a-z A-Z)"
KEY="$(echo -n "/STORPOOLCONF/$NODE/$POOL:StorDriver/internal/AllocationGranularity" | sha256sum | cut -d " " -f 1)"
cat <<EOF
apiVersion: internal.linstor.linbit.com/v1-25-1
kind: PropsContainers
metadata:
name: $KEY
spec:
prop_key: StorDriver/internal/AllocationGranularity
prop_value: "1"
props_instance: /STORPOOLCONF/$NODE/$POOL
EOF You can run it and apply the created resource, then restart the linstor controller.
Afterwards, evacuation will work also on between old and new nodes. |
That did seem to work, thank you very much ! |
I think, I ran into the same issue on a 3-node-proxmox cluster. One node (pve1) was evacuated and upgraded, while the other two are still using an older proxmox/linstor/drbd-module version. When I try to evacuate another node to update it, I get:
I see I don't really understand what your script (@WanzenBug) does - it is Kubernetes specific, I'd guess? Any hints on how to solve this?
|
Your issue is a bit different. Since you upgraded one node, I assume you also have a newer of ZFS. Linstor tries to get the default block size (LINBIT/linstor-server@72ddcb483e). On newer ZFS versions (> 2.2.0) this block size was changed to 16k instead of 8k: openzfs/zfs@72f0521 So there really is a mismatch between these sizes, and because of known bugs in older DRBD versions LINSTOR will refuse to "mix" those storage pools. You may want to ask on the LINBIT forums how to proceed: https://forums.linbit.com/ |
Hi,
Just installed a new node, and when I tried evacuating an existing one I got this :
The node still got marked as evacuating, but the new node didn't get any volumes. All of my nodes are using 9.2.6 since that's the only available version for Talos, so I don't know where it's getting 9.2.7 from.
Any idea, is there a config somewhere I may have missed ?
Thanks
The text was updated successfully, but these errors were encountered: