-
Couldn't load subscription status.
- Fork 1.9k
Description
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Gentoo |
| Distribution Version | default/linux/amd64/17.1 (stable) profile |
| Kernel Version | 6.6.13-gentoo-dist |
| Architecture | x86_64 |
| OpenZFS Version | zfs-2.2.2-r1-gentoo / zfs-kmod-2.2.2-r0-gentoo |
Describe the problem you're observing
After about several days of making autosnapshots on slightly over then twenty natively encrypted zvols hourly scheduled sanoid cronjob starts making errors "cannot iterate filesystems: I/O error" while making snapshots. "zpool status -vx" command gives an output "errors: Permanent errors have been detected in the following files:" and then blank. Getting the same "cannot iterate filesystems: I/O error" errors while trying to list snapshots on some of the zvols. Currently I have about 16 zvols in such error state. Total amount of snapshots is slightly less than a thousand.
"zfs list -t snap -r tank0 | wc -l" command gives me 23 "cannot iterate filesystems: I/O error" lines and the result of 991 to be exact. No errors in dmesg found. This particular pool was made out of three mirrored WD SA500 SSDs which support trim/unmap on LSI controllers but similar results were observed on a different servers and disks.
At least four different servers with a similar setup were failing the same way. Previously I was getting similar issues with ZFS version 2.1.14 and kernel 6.1.69-gentoo-dist with a slight difference that "zpool status -vx" command was giving more detailed output with the list of exact failing snapshots and I could also see an increasing kcf_ops_failed counter using command "grep . /proc/spl/kstat/kcf/*". Later Gentoo marked ZFS 2.2.2 as a stable and I decided to try it one more time with a newer version.
Different server pools started to failure after 3 or 5 days of uptime. All of them had less than a thousand snapshots.
Describe how to reproduce the problem
- Install latest ZFS and binary distribution kernel;
- Create zpool with enabled autotrim (probably irrelevant but that's what I'm doing on SSD pools);
- Enable LZ4 compression on root dataset (probably irrelevant);
- Create dataset with enabled autotrim (probably irrelevant);
- Create an encrypted dataset named "encrypted" which will hold the rest of datasets and zvols;
- Mount zvols to VMs using Xen 4.16.5 hypervisor (might be irrelevant);
- Install sanoid and configure it to make and keep last 36 "hourly", 4 "weekly" and 2 "monthly" snapshots almost for each zvol;
- Configure syncoid on a remote server to replicate snapshots from the source server (probably irrelevant);
- Then wait for few days until the issue will start to appear on more and more zvols generating errors "cannot iterate filesystems: I/O error" during the snapshot operations;
Include any warning/errors/backtraces from the system logs
I couldn't find any related errors in system logs.