Skip to content

Permanent zpool errors after a few days of making natively encrypted zvol snapshots with sanoid #15837

@rjycnfynby

Description

@rjycnfynby

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version default/linux/amd64/17.1 (stable) profile
Kernel Version 6.6.13-gentoo-dist
Architecture x86_64
OpenZFS Version zfs-2.2.2-r1-gentoo / zfs-kmod-2.2.2-r0-gentoo

Describe the problem you're observing

After about several days of making autosnapshots on slightly over then twenty natively encrypted zvols hourly scheduled sanoid cronjob starts making errors "cannot iterate filesystems: I/O error" while making snapshots. "zpool status -vx" command gives an output "errors: Permanent errors have been detected in the following files:" and then blank. Getting the same "cannot iterate filesystems: I/O error" errors while trying to list snapshots on some of the zvols. Currently I have about 16 zvols in such error state. Total amount of snapshots is slightly less than a thousand.

"zfs list -t snap -r tank0 | wc -l" command gives me 23 "cannot iterate filesystems: I/O error" lines and the result of 991 to be exact. No errors in dmesg found. This particular pool was made out of three mirrored WD SA500 SSDs which support trim/unmap on LSI controllers but similar results were observed on a different servers and disks.

At least four different servers with a similar setup were failing the same way. Previously I was getting similar issues with ZFS version 2.1.14 and kernel 6.1.69-gentoo-dist with a slight difference that "zpool status -vx" command was giving more detailed output with the list of exact failing snapshots and I could also see an increasing kcf_ops_failed counter using command "grep . /proc/spl/kstat/kcf/*". Later Gentoo marked ZFS 2.2.2 as a stable and I decided to try it one more time with a newer version.

Different server pools started to failure after 3 or 5 days of uptime. All of them had less than a thousand snapshots.

Describe how to reproduce the problem

  1. Install latest ZFS and binary distribution kernel;
  2. Create zpool with enabled autotrim (probably irrelevant but that's what I'm doing on SSD pools);
  3. Enable LZ4 compression on root dataset (probably irrelevant);
  4. Create dataset with enabled autotrim (probably irrelevant);
  5. Create an encrypted dataset named "encrypted" which will hold the rest of datasets and zvols;
  6. Mount zvols to VMs using Xen 4.16.5 hypervisor (might be irrelevant);
  7. Install sanoid and configure it to make and keep last 36 "hourly", 4 "weekly" and 2 "monthly" snapshots almost for each zvol;
  8. Configure syncoid on a remote server to replicate snapshots from the source server (probably irrelevant);
  9. Then wait for few days until the issue will start to appear on more and more zvols generating errors "cannot iterate filesystems: I/O error" during the snapshot operations;

Include any warning/errors/backtraces from the system logs

I couldn't find any related errors in system logs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions