-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs-2.3.1 patchset #17114
zfs-2.3.1 patchset #17114
Conversation
Added in b1e46f8, but empty, so no point keeping it around. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16931
We should not hardcode 512-byte read size when checking for loader in the boot area before RAIDZ expansion. Disk might be unable to handle that I/O as is, and the code zio_vdev_io_start() handling the padding asserts doing it only for top-level vdev. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16942
When building tests with zinject, it can be quite difficult to work out if you're producing the right kind of IO to match the rules you've set up. So, here we extend injection records to count the number of times a handler matched the operation, and how often an error was actually injected (ie after frequency and other exclusions are applied). Then, display those counts in the `zinject` output. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#16938
To do a cross-build using only kbuild rather than a full source tree, ARCH= needs to be passed for the kbuild Makefile to find the archspecific Makefile. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16944
Removed three unnecessary spaces in the definition of the sa_attr_reg_t structure to improve code style consistency and adhere to OpenZFS coding standards. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Peng Liu <[email protected]> Closes openzfs#16955
2.3.0 is out now, so make 2.2.x the LTS release. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#16945 Closes openzfs#16948
The warning at the end of the second example in the description section was actually inside the options table. Move the El macro to match what is done in the first section for improved readability. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Ziaee <[email protected]> Closes openzfs#16962
Reviewed-by: George Amanakis <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Tim Smith <[email protected]> Closes openzfs#16965
The old kstat helper function was barely used, I suspect in part because it was very limited in the kinds of kstats it could gather. This adds new functions to replace it, for each kind of thing that can have stats: global, pool and dataset. There's options in there to get a single stat value, or all values within a group. Most importantly, the interface is the same for both platforms. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
Removes other custom helpers and direct accesses to /proc. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
It's now a simple wrapper, so lets just call kstat direct. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
I'm about to add a new "type", and I need somewhere to put it! Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16947
Injecting a device probe failure is not possible by matching IO types, because probe IO goes to the label regions, which is explicitly excluded from injection. Even if it were possible, it would be awkward to do, because a probe is sequence of reads and writes. This commit adds a new IO "type" to match for injection, which looks for the ZIO_FLAG_PROBE flag instead. Any probe IO will be match the injection record and recieve the wanted error. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16947
The flag VFCF_FILEREV was recently defined in FreeBSD so that a file system could indicate that it increments va_filerev by one for each change. Since ZFS does do this, set the flag if defined for the kernel being built. This allows the NFSv4.2 server to reply with the correct change_attr_type attribute value. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rick Macklem <[email protected]> Closed openzfs#16976
Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Alan Somers <[email protected]> Sponsored by: ConnectWise Closes openzfs#16980
Originally openzfs#16856 updated Linux Direct I/O requests to use the new pin_user_pages API. However, it was an oversight that this PR only handled iov_iter's of type ITER_IOVEC and ITER_UBUF. Other iov_iter types may try and use the pin_user_pages API if it is available. This can lead to panics as the iov_iter is not being iterated over correctly in zfs_uio_pin_user_pages(). Unfortunately, generic iov_iter API's that call pin_user_page_fast() are protected as GPL only. Rather than update zfs_uio_pin_user_pages() to account for all iov_iter types, we can simply just call zfs_uio_get_dio_page_iov_iter() if the iov_iter type is not ITER_IOVEC or ITER_UBUF. zfs_uio_get_dio_page_iov_iter() calls the iov_iter_get_pages() calls that can handle any iov_iter type. In the future it might be worth using the exposed iov_iter iterator functions that are included in the header iov_iter.h since v6.7. These functions allow for any iov_iter type to be iterated over and advanced while applying a step function during iteration. This could possibly be leveraged in zfs_uio_pin_user_pages(). A new ZFS test case was added to test that a ITER_BVEC is handled correctly using this new code path. This test case was provided though issue openzfs#16956. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Closes openzfs#16956 Closes openzfs#17006
…penzfs#16857) Introduced functionality to recursively mount datasets with a new config option `mount_recursively`. Adjusted existing functions to handle the recursive behavior and added tests to validate the feature. This enhances support for managing hierarchical ZFS datasets within a PAM context. Signed-off-by: Jerzy Kołosowski <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
This change will prevent prefetch to perform unnecessary ARC buffer fill when reading from disk. Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Jaydeep Kshirsagar <[email protected]> Co-authored-by: Alexander Motin <[email protected]> Closes openzfs#17013
As zios are reexecuted after resume from suspension, their ready and wait states need to be propagated to wait counts on all their parents. It's possible for those parents to have active children passing through READY or DONE, which then end up in zio_notify_parent(), take their parent's lock, and decrement the wait count. Without also taking a lock here, it's possible for an increment race to occur, which leads to either there being no references left (tripping the assert in zio_notify_parent()), or a parent waiting forever for a nonexistent child to complete. To protect against this, we simply take the appropriate zio locks in zio_reexecute() before updating the wait counts. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17016
This is a convenience for filesystems that need the inode of their parent or their own name, as its often complicated to get that information. We don't need those things, so this is just detecting which prototype is expected and adjusting our callback to match. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
According to the upstream change, all callers set it, and all block devices either honoured it or ignored it, so removing it entirely allows a bunch of handling for the "unset" case to be removed, and it becomes effectively implied. We follow suit, and keep setting it for older kernels. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
The current documentation of `zfs destroy` in application to snapshots is particularly difficult to understand. The following changes are made: - Remove circular reference to `zfs destroy` in the documentation of that command. - Remove use of "for example", which implies there are more, undocumented reasons that ZFS may fail to destroy a snapshot immediately. - Mention properties `defer_destroy` and `userrefs`. - Add `zfsprops(8)` to "SEE ALSO" list. - Clarify meaning of `-d` option. Requires-builders: none Signed-off-by: mnrx <[email protected]> Co-authored-by: Alexander Motin <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Amanakis <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
Sponsored-by: Wasabi Technology, Inc. Sponsored-by: Klara, Inc. Signed-off-by: Mateusz Piotrowski <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Amanakis <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
When you are using large recordsizes in conjunction with raidz, with incompressible data, you can pretty reliably be making 21 MB allocations. Unfortunately, the fragmentation metric in ZFS considers any metaslabs with 16 MB free chunks completely unfragmented, so you can have a metaslab report 0% fragmented and be unable to satisfy an allocation. When using the segment-based metaslab weight, this is inconvenient; when using the space-based one, it can seriously degrade performance. We expand the fragmentation table to extend up to 512MB, and redefine the table size based on the actual table, rather than having a static define. We also tweak the one variable that depends on fragmentation directly. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#16986
Gang blocks have a significant impact on the long and short term performance of a zpool, but there is not a lot of observability into whether they're being used. This change adds gang-specific kstats to ZFS, to better allow users to see whether ganging is happening. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#17003
recv_fix_encryption_hierarchy() in its present state goes through all stream filesystems, and for each one traverses the snapshots in order to find one that exists locally. This happens by calling guid_to_name() for each snapshot, which iterates through all children of the filesystem. This results in CPU utilization of 100% for several minutes (for ~1000 filesystems on a Ryzen 4350G) for 1 thread at the end of a raw receive (-w, regardless whether encrypted or not, dryrun or not). Fix this by following a different logic: using the top_fs name, call gather_nvlist() to gather the nvlists for all local filesystems. For each one filesystem, go through the snapshots to find the corresponding stream's filesystem (since we know the snapshots guid and can search with it in stream_avl for the stream's fs). Then go on to fix the encryption roots and locations as in its present state. Avoiding guid_to_name() iteratively makes recv_fix_encryption_hierarchy() significantly faster (from several minutes to seconds for ~1000 filesystems on a Ryzen 4350G). Another problem is the following: in case we have promoted a clone of the filesystem outside the top filesystem specified in zfs send, zfs receive does not fail but returns an error: recv_incremental_replication() fails to find its origin and errors out with needagain=1. This results in recv_fix_hierarchy() not being called which may render some children of the top fs not mountable since their encryption root was not updated. To circumvent this make recv_incremental_replication() silently ignore this error. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes openzfs#16929
"DESTDIR=/path/to/target/root/ make install" may fail when installing to a root that contains an existing lib/modules structure. When run as root we may even affect the wrong kernel (the build system's one, or, if running a different version, some other directory in /lib/modules, but not the desired one installed in DESTDIR). Add a missing reference to the INSTALL_MOD_PATH root when calling "depmod" during "make install" Also add a switch "DONT_DELETE_MODULES_FILES=1" that skips the removal of files named "modules.*" prior to running depmod. Signed-off-by: Christian Kohlschütter <[email protected]> Closes openzfs#16994 Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
The purpose of no-op is to simulate a failure between a device cache and its permanent store. We still want it to go through the queue and respond in the same way to everything else. So, inject "success" as the very last thing, and then move on to VDEV_IO_DONE to be dequeued and so any followup work can occur. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17029
For zfs_rename, after the dataset name is successfully updated, the dataset handle that was passed to zfs_rename, still contains the old name, due to which, the dataset handle becomes invalid. The following operations performed using this handle result in error since the dataset with old name cannot be found anymore. changelist_rename does update the names in dataset handles, but those are temporary handles that were created during changelist_gather. The original handle that was used to call zfs_rename is not updated. We should update the name in original ZFS handle after the IOCTL for rename returns success for the operation. Signed-off-by: Umer Saleem <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
skc->skc_name also needs to be freed in an error path. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Vandana Rungta <[email protected]> Closes openzfs#17041
Linux 6.12 has conflicting range_tree_{find,destroy,clear} symbols. Signed-off-by: Ivan Volosyuk <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Rob Norris <[email protected]>
Since we are calculating a free space fragmentation, we should weight metaslabs by the amount of their free space, not a full size. Fragmentation of full metaslabs may not matter in presence empty ones. The old algorithm did not differentiate metaslabs having only one free 4KB block from metaslabs having 50% of space free in 4KB blocks, reporting higher fragmentation. While there, move metaslab_group_alloc_update() call after setting mg_fragmentation, otherwise the effect may be delayed by one TXG. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Tony Hutter <[email protected]>
Kernel & userspace specifics are in zfs_file_os.c, so there's no particular reason these have to be separate. The one platform-specific part is in the Linux kernel part, to offload flushes to a taskq if we're already inside a filesystem transaction. This would be normally be an unsatisfying wart, but I'm intending to remove this shortly, so I'm content to leave it gated for the moment. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
Need to use arc_free_data_abd to free abd type buffer. Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes openzfs#17079
zfs_file_fsync() and zfs_file_deallocate() are both blocking ops, so the zio_taskq thread is active and blocked both while waiting for the IO call and then while calling zio_execute() for the next stage. This is a particular issue for FLUSH, as the z_flush_iss queue typically only has one thread; multiple flushes arriving at once can cause long delays if the underlying fsync() response is particularly slow. To fix this, we dispatch both FLUSH and TRIM to the z_vdev_file taskq, just as we do for reads and writes. Further, we return all results through zio_interrupt(), so neither the issue nor the file taskqs are blocked. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17064
It's included so it's effectively already part of it, but it's not always installed as a userspace header, making zfs.h effectively useless. Might as well just combine it. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Rob Norris <[email protected]> Close openzfs#17066
If the timing is unfortunate, the pool can suspend just as we're failing because it didn't suspend. If we don't resume the pool, we hang trying to destroy it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17054
Before this change zfs_metaslab_switch_threshold tunable switched metaslabs each time ones index reduced by two (which means biggest contiguous chunk reduced to 1/4). It is a good idea to balance metaslabs fragmentation. But for empty metaslabs (having power- of-2 sizes) this means switching when they get just below the half of their capacity. Inspection with zdb after filling new pool to half capacity shown most of its metaslabs filled to half capacity. I consider this sub-optimal for pool fragmentation in a long run. This change blocks the metaslabs switching if most of the metaslab free space (15/16) is represented by a single contiguous range. Such metaslab should not be considered fragmented until it actually fail some big allocation. More contiguous filling should improve data locality and increase time before previously filled and partially freed metaslab is touched again, giving it more time to free more contiguous chunks for lower fragmentation. It should also slightly reduce spacemap traffic. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Paul Dagnelie <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17081
Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: SHENGYI HONG <[email protected]> Closes openzfs#17088
Don't try to get mg of hole vdev in removal Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]> Closes openzfs#17080
In l2arc_evict(), the config lock may be acquired in reverse order (e.g., first the config lock (writer), then a hash lock) unlike in arc_read() during scenarios like L2ARC device removal. To avoid deadlocks, if the attempt to acquire the config lock (reader) fails in arc_read(), release the hash lock, wait for the config lock, and retry from the beginning. Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes openzfs#17071
`zpool create` won't let you use relative paths to disks. This is annoying when you want to do: zpool create tank ./diskfile But have to do.. zpool create tank `pwd`/diskfile This fixes it. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#17042
Update the META file to reflect compatibility with the 6.13 kernel. Signed-off-by: Tony Hutter <[email protected]> Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]>
ie #17089. Also would want #17099 to fix a couple of things in there. I've got no problem including them, since it's just test infrastructure (and a bug that can't be hit). @pharshalp any particular reason why you want them on a release? I'm not opposed, but they're developer support at most. |
Add a new 'zfs-qemu-packages' GH workflow for manually building RPMs and test installing ZFS RPMs from a yum repo. The workflow has a dropdown menu in the Github runners tab with two options: Build RPMs - Build release RPMs and tarballs and put them into an artifact ZIP file. The directory structure used in the ZIP file mirrors the ZFS yum repo. Test repo - Test install the ZFS RPMs from the ZFS repo. On Almalinux, this will do a DKMS and KMOD test install from both the regular and testing repos. On Fedora, it will do a DKMS install from the regular repo. All test install results will be displayed in the Github runner Summary page. Note that the workflow provides an optional text box where you can specify the full URL to an alternate repo. If left blank, it will install from the default repo from the zfs-release RPM. Most developers will never need to use this workflow. It is intended to be used by the ZFS admins for building and testing releases. This commit also modularizes many of the runner scripts so they can be used by both the zfs-qemu and zfs-qemu-packages workflows. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#17005
META file and changelog updated. Signed-off-by: Tony Hutter <[email protected]>
Can this one be included? |
What will be included regarding #17052? |
What's regarding #16631? |
@wiesl The first link is to a still open PR, fixing panic on rare very specific pool corruption. What's the hurry? The other two links are to not even diagnosed issues. What exactly do you propose to include? |
Motivation and Context
Support 6.13 kernel, various fixes.
Description
Fedora is now running the 6.13 kernel, so a release is needed. This PR is just a copy of the
zfs-2.3.1-staging
branch, which in turn is just the commits from #17097.How Has This Been Tested?
Runners will test
Types of changes
Checklist:
Signed-off-by
.