Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kdump over NFS fails for f41 #52

Open
jbtrystram opened this issue Oct 24, 2024 · 3 comments
Open

kdump over NFS fails for f41 #52

jbtrystram opened this issue Oct 24, 2024 · 3 comments

Comments

@jbtrystram
Copy link

Using the following config in fedora coreOS 41 :

          nfs 10.0.2.2:/
          path /crash
          core_collector makedumpfile -l --message-level 1 -d 31
          extra_bins /sbin/mount.nfs 
          extra_modules nfs nfsv3 nfs_layout_nfsv41_files blocklayoutdriver nfs_layout_flexfiles nfs_layout_nfsv41_files

The initramfs indefinitely wait after mounting kdumproot.mount:

[    3.103234] systemd[1]: Mounted kdumproot.mount - /kdumproot.
[  OK  ] Mounted kdumproot.mount - /kdumproot.
[    3.009698] systemd[1]: Mounted kdumproot.mount - /kdumproot.
[    3.106639] systemd[1]: Reached target remote-fs.target - Remote File Systems.

[    3.012510] systemd[1]: Reached target remote-fs.target - Remote File Systems.
[  *** ] Job dev-disk-by\x2dpath-pci\x2d0000…tart running (1min 48s / no limit)

using kexec-tools-2.0.29-1
I tried to downgrade nfs-utils-coreos to the f40 rpm but it does not fix the issue.

The same setup works fine in F40.

jbtrystram added a commit to jbtrystram/fedora-coreos-config that referenced this issue Oct 24, 2024
We were not testing this until recently [1] so it slipped through.

Upstream issue : rhkdump/kdump-utils#52
[1] coreos/coreos-assembler#3911
dustymabe pushed a commit to coreos/fedora-coreos-config that referenced this issue Oct 24, 2024
We were not testing this until recently [1] so it slipped through.

Upstream issue : rhkdump/kdump-utils#52
[1] coreos/coreos-assembler#3911
@coiby
Copy link
Member

coiby commented Oct 25, 2024

I find downgrading systemd to systemd-stable-255.10 on F41 could make kdump work again. Note I built systemd-stable-255.10 from source as latest systemd on F41 is v256.

@licliu
Copy link
Collaborator

licliu commented Oct 25, 2024

The other thing I find is that pci-0000:04:00.0-part is only shown in f41, and it is a folder.

ls -lh /dev/disk/by-path/ |grep pci-0000:04:00.0-part$
drwxr-xr-x. 7 root root 140 Oct 25 03:18 pci-0000:04:00.0-part

On f40:

ls -lh /dev/disk/by-path/ |grep pci-0000:04:00.0-part$
echo $?
1

@licliu
Copy link
Collaborator

licliu commented Oct 25, 2024

This symbol link is created by /usr/lib/udev/rules.d/60-persistent-storage.rules. Those udev rules are introuced by this commit systemd/systemd@3af66c0

For unmounted nfs, dracut cannot determine it's fstype findmnt -e -v -n -o 'FSTYPE' --source "$_find_dev" and dracut will use 0:0 as its maj:min, unfortunately, the output of stat -L -c '%t:%T' /dev/disk/by-path/pci-0000:04:00.0-part/ is also 0:0, so dracut treats them as the same device and writes the latter as a persistent name to the dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fdisk\x2fby-path\x2fpci-0000:04:00.0-part.sh.

I think this bug consists of three parts:

  1. nfs is used as the dump target in kdump, and it is not mounted and is not in fstab.
  2. dracut cannot find the fstype of "nfs device" using findmnt, so dracut think it is a local device. And then its maj:min is set to 0:0.
  3. Due to the update of udev rule, a device with maj:min of 0:0 happens to exist in the /dev/disk/by-path directory.

These factors together cause the second kernel to wait for an impossible task - mounting /dev/disk/by-path/pci-0000:04:00.0-part/

jbtrystram added a commit to jbtrystram/coreos-assembler that referenced this issue Nov 3, 2024
This way we have good coverage of most-used kdump features.
Some context on the NFS kdump configuration:
coreos/fedora-coreos-tracker#1729

This was previously merged in [1] then reverted in [2] because the nfs
server container was not multi-arch, causing the pipeline to trip on it.

It's also not functionning on systemd256 (so anything f41 and above),
see [3]

This requires coreos#3917 for
the multi-arch container, and
coreos#3921

[1] coreos@b10d8dc
[2] coreos@af1468c
[3] rhkdump/kdump-utils#52
jbtrystram added a commit to jbtrystram/coreos-assembler that referenced this issue Nov 3, 2024
This way we have good coverage of most-used kdump features.
Some context on the NFS kdump configuration:
coreos/fedora-coreos-tracker#1729

This was previously merged in [1] then reverted in [2] because the nfs
server container was not multi-arch, causing the pipeline to trip on it.

It's also not functionning on systemd256 (so anything f41 and above),
see [3]

This requires coreos#3917 for
the multi-arch container, and
coreos#3921

[1] coreos@b10d8dc
[2] coreos@af1468c
[3] rhkdump/kdump-utils#52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants