Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistencies on /dev/disk/by-id/* when using similar long device names and multipath #13701

Closed
hamistao opened this issue Jul 4, 2024 · 8 comments

Comments

@hamistao
Copy link
Contributor

hamistao commented Jul 4, 2024

Required information

  • Distribution: Ubuntu
  • Distribution version: 22.04
  • snap: latest/edge
  • LXC version: 5.21.1 LTS
  • LXD version: 5.21.1 LTS

Issue description

When using two block devices with names that share a prefix of 16 characters (after escaping) and also using multipath, the /dev/disk/by-id/* symlinks to those devices are identical resulting in a single symlink being created. The second device would now be accessed through /dev/dm-0 and not /dev/sdc. Although the latter exists, making a filesystem on it fails and returns /dev/sdc is apparently in use by the system; will not make a filesystem here!. That problem only occurs when multipath-tools is installed and removing it and restarting the VM fixes this.
Note that the second device is still usable, but through /dev/dm-0. So this may not be critical but it should be worth some discussion to better understand what is happening.

Steps to reproduce

  1. lxc launch ubuntu:n vm --vm # Ubuntu images come with multipath-tools installed
  2. longName="long-device-name"
  3. lxc storage volume create default vol1 --type=block
  4. lxc storage volume create default vol2 --type=block
  5. sleep 30
  6. lxc exec vm -- systemctl is-system-running --wait
  7. lxc config device add vm "${longName}1" disk pool=default source=vol1
  8. lxc config device add vm "${longName}2" disk pool=default source=vol2
  9. lxc exec vm -- mkfs.ext4 /dev/sdc # Fails
  10. lxc exec vm -- apt autopurge multipath-tools
  11. lxc restart vm
  12. lxc exec vm -- mkfs.ext4 /dev/sdc # Succeeds
@hamistao
Copy link
Contributor Author

hamistao commented Jul 4, 2024

Please correct me if I am wrong @simondeziel. This seems to be happening because when using a long block device name, the /dev/disk/by-id only includes the first 16 characters of the device name (after escaping).
For example, using long-device-name1 as block device name would result in its path on /dev/disk/by-id to look like 0QEMU_QEMU_HARDDISK_lxd_long--device--na, and the same would happen for a second added device named long-device-name2, overwriting the first /dev/disk/by-id link and leaving the first device without a link on /dev/disk/by-id.
This could be making multipath-tools assume both devices are the same and creating the inconsistencies described above.

@tomponline
Copy link
Member

@hamistao I dont think there is much we can do about this I'm afraid.

@hamistao
Copy link
Contributor Author

hamistao commented Jul 4, 2024

@tomponline I agree, I don't think we can't fix this without changing /dev/disk/by-id paths. I wanted to open this in case someone else could think of a solution.

@tomponline
Copy link
Member

If this is a limitation of udev there isn't much we can do.

@simondeziel
Copy link
Member

@hamistao as you described, I think that due the /dev/disk/by-id symlinks ending up as the same, the 2nd disk sharing the name prefix is the one taking over the by-id symlink. This in turn seems to hint multipath into thinking there are many paths leading to the same disk.

@simondeziel simondeziel changed the title Inconsistencies on /dev/* when using similar long device names and multipath Inconsistencies on /dev/disk/by-id/* when using similar long device names and multipath Jul 4, 2024
@mitchdz
Copy link

mitchdz commented Jul 16, 2024

I took a look at this, and agree with the findings. The udev rules are shortening the WWID so you see the following:

root@n-vm:~# ll /dev/disk/by-id | grep device--na
lrwxrwxrwx 1 root root   9 Jul 16 23:21 scsi-0QEMU_QEMU_HARDDISK_lxd_long--device--na -> ../../sdc
lrwxrwxrwx 1 root root   9 Jul 16 23:20 scsi-SQEMU_QEMU_HARDDISK_lxd_long--device--name1 -> ../../sdb
lrwxrwxrwx 1 root root   9 Jul 16 23:20 scsi-SQEMU_QEMU_HARDDISK_lxd_long--device--name2 -> ../../sdc

FWIW, in this scenario, you know what to expect for the prefix. Therefore you can blacklist the WWID with multipath-tools, and avoid removing multipath-tools.

Add the following to /etc/multipath.conf

blacklist {
       wwid 0QEMU_QEMU_HARDDISK_lxd_long--device--na
}

And restart multipath-tools

systemctl restart multipath-tools

And then you will see your block devices are not multipathd. This will persist across reboots.

root@n-vm:~# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda       8:0    0   10G  0 disk 
├─sda1    8:1    0    9G  0 part /
├─sda14   8:14   0    4M  0 part 
├─sda15   8:15   0  106M  0 part /boot/efi
└─sda16 259:0    0  913M  0 part /boot
sdb       8:16   0   10G  0 disk 
sdc       8:32   0   10G  0 disk 

@tomponline
Copy link
Member

@hamistao @simondeziel is there anything to do on this issue or can it be closed?

@simondeziel
Copy link
Member

Let's close this bug as LXD has no real way to workaround this. The operator has to know to use different device prefixes if multipathd is to be used inside the instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants