-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multipath-tools 0.9.8 #81
Conversation
Cc: Martin Wilck <[email protected]> Cc: Benjamin Marzinski <[email protected]> Cc: Christophe Varoqui <[email protected]> Cc: DM-DEVEL ML <[email protected]> Signed-off-by: Xose Vazquez Perez <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
Instead of always enabling queueing when a map is reloaded with no_path_retry set to a positive number, check if the map has timed out in recovery mode, and only enable queueing if it has not. This saves multipathd from having to disable queueing on the map immediately after the reload. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
If a multipath device has no_path_retry set to a number and has lost all paths, gone into recovery mode, and timed out, it will disable queue_if_no_paths. After that, if the device is reloaded by multipath outside of multipathd, it will re-enable queuieng on the device. When multipathd later calls set_no_path_retry() to update the queueing state, it will not disable queue_if_no_paths, since the device is still in the recovery state, so it believes no work needs to be done. The device will remain in the recovery state, with retry_ticks at 0, and queueing enabled, even though there are no usable paths. To fix this, in set_no_path_retry(), if no_path_retry is set to a number and the device is queueing but it is in recovery mode and out of retries with no usable paths, manually disable queue_if_no_path. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
The commands to show a multipath device are supposed to return its current state without updating it. Even when reset is 0, update_multipath() still can update the device. To fix this, split __setup_multipath() into two functions: refresh_multipath(), that updates the table and status, and setup_multipath(), which works as before but now calls refresh_multipath(). Make the multipathd show commands call refresh_multipath() instead of update_multipath(). With the show commands calling refresh_multipath(), all callers of update_multipath() set the reset argument, so remove it and always call setup_multipath() from update_multipath(). Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Make multipathd update mpp->features when in enables or disables queuing. This patch handles all the cases except failed removes by dm_suspend_and_flush_map(), which is never called by multipathd. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Make the restorequeueing command only do something if disablequeueing has first been run on the map. Also update the man page to explain what restorequeuing actually does. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Fixes: 3e71d8a ("multipath-tools Makefiles: create config.mk") Suggested-by: Lidong Zhong <[email protected]> Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
Signed-off-by: Josh Soref <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Josh Soref <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
Instead of flush_map() handling both user requested flushes and automatic flushes when the last path has been deleted, make flush_map_nopaths() handle the automatic flushes itself, since a later patch will change the behavior of flush_map(). Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
When the multipath command tries to delete a multipath device, it first disables queueing and then suspends the device to force the IOs to get flushed. Then it attempts to delete the device and any kpartx partitions. multipathd, on the other hand, simply tries to delete the device and kpartx partitions, without disabling queueing or suspending. If there are no paths but there is outstanding IO, multipathd will hang trying to delete the last kpartx device. This is because it must be the last opener of the multipath device (multipath won't try to delete the device if it has any openers besides the kpartx devices) and the kernel will not allow the last opener of a block device to close until all the outstanding IO is flushed. This hang can be avoided if multipathd calls dm_suspend_and_flush_map() like the multipath command does, instead of dm_flush_map(). Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Make cli_del_maps() call dm_suspend_and_flush_map() for the unknown multipath devices as well. After this change, all callers of cli_del_maps() set need_suspend, so simplify dm_flush_maps(). Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
cli_del_map() does a lot of unnecessary work to match the arguments of ev_remove_map(), of which it is the only caller. Then ev_remove_map() does more unnecessary work to verify the arguments passed in. remove ev_remove_map() and make cli_del_map() get the mpp like the rest of the client handlers do. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
A future patch will add an additional return code, so make this an enum instead of just using numbers. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
dm_remove_partmaps() is only used in devmapper.c, so make it static. It does need to be declared early, since remove_partmaps() and it call eachother. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
When multipathd interactive commands fail for certain reasons, like timing out or incorrect permissions, they do not reply with "fail\n". Currently, multipath and multipathc expect that a reply other than "fail\n" means success. Instead, prefix all failure replies with "fail\n", and adapt multipath and multipathc to return failure for any reply starting with "fail\n". Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
If removing a multipath device fails because the device is in use, return DM_FLUSH_BUSY from remove_functions, which causes cli_del_map() to return -EBUSY, which will now print extra information in default_reply(). Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Make cli_del_maps() return -EBUSY like cli_del_map() if it fails because a device is in use and it doesn't run into any other type of failures. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
None of the callers of find_mp_by_str() print any message if they fail because the map name is invalid. Print one in find_mp_by_str() to save the effort of adding it to all the callers. Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
multiple client handlers simply open coded find_mp_by_str(). Just use the function instead. Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Some of the client handlers checked for paths by both dev and devt, but not all. Also, many of the client handlers don't print anything if they failed to find a path. Make all the client handlers which work on path devices use a new function, find_path_by_str(), which will try both methods to find a path, and can print out an error message if none is found. (mwilck: find_path_by_str() tries to match device names by devt (major:minor) first, then by device name, whereas the open-coded lookup checked for the device name first. This is for consistency with other device lookup algorithms. It doesn't make a difference in practice, because no real devices have names in the major:minor format.) Signed-off-by: Benjamin Marzinski <[email protected]> Reviewed-by: Martin Wilck <[email protected]>
When the cli_handlers cannot find the requested map or path, they will now return -ENODEV, which prints extra information in default_reply(). Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
currently, most of the callers of _process_cmd() do not gracefully handle the case where multipathd returns "fail\n". dmmp_flush_mpath() does, but it does extra work to try to figure out after the fact why the flush command failed. Instead, handle fail replies in _process_cmd() using the appropriate DMMP error codes. Cc: Gris Ge <[email protected]> Reviewed-by: Martin Wilck <[email protected]> Signed-off-by: Benjamin Marzinski <[email protected]>
Run the expensive multi-arch workflows only if actual source code has changed. Signed-off-by: Martin Wilck <[email protected]>
Only check end-user visible files: - README - man pages - public header files - udev rules and systemd unit files Also don't check file names, and make sure that the spelling workflow is only rung pushes that alter any of the files above. Add custom patterns (patterns.txt) and dictionary words (expect.txt) to make the spell check pass. See https://github.com/check-spelling/check-spelling/wiki/Configuration-Examples Signed-off-by: Martin Wilck <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
The reference ABI (being a workflow artifact) can expire, causing the download to fail. In this case, the abi workflow should not report success. Also, add the "workflow_dispatch" event for the abi workflow, so that we can run it manually if a previous run failed. Signed-off-by: Martin Wilck <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
Document "on" and "off", and mention that "yes"/"1" and "no"/"0" are still accepted as alias values. Suggested-by: Paul Donohue <[email protected]> Signed-off-by: Martin Wilck <[email protected]>
Pushed a minor change to the |
The Infinibox hwtable entry had fast_io_fail and dev_loss both set to 15, which has the effect that fast_io_fail is switched off (dev_loss timeout must be higher than fast_io_fail timeout). Remove the dev_loss value, which will cause the default of 600 to be used. Experience shows that dev_loss_tmo values below a few minutes are seldom helpful. Fixes: 55da608 ("libmultipath: update INFINIDAT builtin config") Signed-off-by: Martin Wilck <[email protected]> Cc: Arnon Yaari <[email protected]> Cc: Xose Vazquez Perez <[email protected]> Cc: Tom Abraham <[email protected]> Cc: Jiri Belka <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
Pushed another patch after review from @bmarzins. Once [PATCH v3] 11-dm-mpath.rules: handle reloads during coldplug events gets reviewed, I can add the udev rule series, too. |
The CI run for "multiarch test for rolling distros / build-current (debian-sid, arm/v7)" has apparently failed because of an issue with downloading the test container. @cvaroqui please restart the workflow if you have time. |
Make sure we import all properties that are also imported in 13-dm-disk.rules. Keep importing ID_FS_TYPE for now to avoid breakage, even if 13-dm-disk.rules does not. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
We have to import the properties if either DM_NOSCAN or DM_DISABLE_OTHER_RULES_FLAG is set, because blkid will be skipped in both cases. Also, if DM_UDEV_PRIMARY_SOURCE_FLAG is not set, it makes no sense to try and import the properties. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
DM_UDEV_DISABLE_OTHER_RULES_FLAG is handled by 10-dm.rules, which imports it from db if necessary. There is no need to do this again here. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
If a map reload happens while udev is processing rules for a coldplug event, DM_SUSPENDED may be set if the respective test in 10-dm.rules happens while the device is suspened. This will cause the rules for all higher block device layers to be skipped. Record this situation in an udev property. The reload operation will trigger another "change" uevent later, which would normally be treated as a reload, and be ignored without rescanning the device. If a previous "coldplug while suspended" situation is detected, perform a full device rescan instead. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
The current rules overwrite DM_UDEV_DISABLE_OTHER_RULES_FLAG and save its value in DM_UDEV_DISABLE_OTHER_RULES_FLAG_OLD if MPATH_DEVICE_READY changes from 1 to 0, and restore DM_UDEV_DISABLE_OTHER_RULES_FLAG from DM_UDEV_DISABLE_OTHER_RULES_FLAG_OLD when MPATH_DEVICE_READY changes back from 0 to 1. This is wrong for multiple reasons. For "spurious" events, 10-dm.rules will read DM_UDEV_DISABLE_OTHER_RULES_FLAG from the udev db and obtain the value saved by 11-dm-mpath.rules rather than it's own saved value, which confuses the logic in 10-dm.rules. 10-dm.rules sets the flag if either DISK_RO==1 or DM_SUSPENDED==1, and never clears it (it can only be cleared by a new genuine libdm event that doesn't have DM_UDEV_DISABLE_OTHER_RULES_FLAG set, while the device is not suspended). lvm commands may set the flag, too, but AFAICS this is only done for certain types of logical volumes, not for multipath maps. If a previously suspended device is resumed, DM_UDEV_DISABLE_OTHER_RULES_FLAG would be cleared, and it would be wrong for 11-dm-mpath.rules to overwrite it with a previuosly saved value. Likewise, if a uevent arrives for a suspended map, and MPATH_DEVICE_READY happens to switch from 0 to 1, it would be wrong to clear DM_UDEV_DISABLE_OTHER_RULES_FLAG by setting it to a previously saved value. We need to set DM_UDEV_DISABLE_OTHER_RULES_FLAG=1 to prevent follow-up rules from attempting I/O on the device. But don't try to save and restore DM_UDEV_DISABLE_OTHER_RULES_FLAG between uevents. Rather, reset DM_UDEV_DISABLE_OTHER_RULES_FLAG to the value we got from 10-dm.rules in a late rule. I chose "99-z-" for this rule's prefix to make sure it runs after 99-systemd.rules. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
For all "spurious" events, which includes coldplug events, DM_DISABLE_OTHER_RULES_FLAG will be read from the udev DB in 10-dm.rules. Thus if a previous event saw the device in suspended state, the flag will be set even if the device has meanwhile resumed. Reset the flag if none of the conditions hold that would cause it to be set in a genuine uevent in 10-dm.rules. It would be cleaner to do this in 10-dm.rules directly, but it cannot be done easily, because the flag can also have an origin inside lvm itself; lvm sets it for various kinds of logical volumes. For generic (non-LVM) dm devices, the flag is only set in 10-dm.rules though, so doing this is safe for multipath. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
If paths become available while the device is suspended, don't activate. Another uevent will arrive when the device is resumed. Signed-off-by: Martin Wilck <[email protected]> Reviewed-by: Benjamin Marzinski <[email protected]>
Pushed the the patch series for 11-dm-mpath.rules now, as the series has been completely reviewed. |
Signed-off-by: Martin Wilck <[email protected]>
Pushed version bump to 0.9.8 |
This file sums up the content of the GitHub PRs since 0.9.0, giving a high-level overview of the changes in the past releases. I thought it might be useful for users and/or distribution maintainers. Signed-off-by: Martin Wilck <[email protected]>
I added a the file NEWS.md that gives an overview about the changes in our code over the last 2 years. |
@cvaroqui, I believe this should be good for merging now. |
Again, the workflow failures seem to have been caused by problems pulling the test containers. |
@cvaroqui, please also push a 0.9.8 tag. |
Hi Christophe,
here's a PR for 0.9.8. As usual, all patches have been reviewed, except the ones marked with (*) below, which are some GitHub action changes and 2 minor documentation fixes.
(Side note: we should start thinking about what the next version after 0.9.9 should be).
News
multipathd.socket
has been disabled by default because it has undesirable side effects (fixes Socket activation doesn't play well with udev rule #76, at least partially).restorequeueing
CLI command now only enables queueing ifdisablequeueing
had been sent before.-D_FILE_OFFSET_BITS=64
to fix issues with emulated 32-bit environments in the GitHub CI, so that we can now run our CI in arm/v7.Bug fixes
CI
Patch breakdown
@bmarzins (26):
multipathd: Make sure to disable queueing if recovery has failed.
multipathd: don't modify the multipath device on show commands
libmultipath: keep track of queueing state in features
multipathd: only restore queueing if it has been disabled first
multipathd: remove nopath flushing code from flush_map()
multipathd: make flush_map() delete maps like the multipath command
multipathd: disable queueing when removing unknown maps
multipathd: simplify cli_del_map()
libmultipath: make _dm_flush_map() return an enum
libmultipath: make dm_remove_partmaps() a static function
multipathd: always start failure replies with "fail\n"
multipathd: print extra default reply msg for busy devices
multipathd: handle busy devices in cli_del_maps()
libmultipath: print error when find_mp_by_str() fails.
multipathd: don't open code find_mp_by_str() in client handers
multipathd: make cli_handlers check for paths by devt and dev
multipathd: add cli_handler reply message for missing devs
libdmmp: handle failures in _process_cmd
multipath: get rid of unnecessary retries variable
multipath: Don't locally retry deletgated remove failures
multipath: if delegation times out mark as not delegated
multipathd: sync features on flush_map failure corner case
multipathd: fix null pointer dereference in uev_update_path
multipathd: fix auto-resize configuration
libmultipath: fix displaying auto_resize config setting
multipath-tools tests: add void parameter to functions
@brianatpurestorage (2):
multipathd: the local path change is not considered
libmultipath: stop PURE FlashArray from detecting priority
@jsoref (9):
spelling: anymore
spelling: cannot
spelling: case-insensitive
spelling: case-sensitive
spelling: configuration
spelling: correctly
spelling: numerically
spelling: preexisting
spelling: than
@mwilck (29):
libmultipath: avoid temporarily enabling queueing on reload
libmultipath: fix ANA prioritizer enablement logic
multipathd: init_unwinder: protect pthread_cond_wait() call with a loop
libmultipath: is_uevent_busy(): check servicing_uev under lock
libmultipath: tur: protect pthread_cond_timedwait with a loop
multipathd: make update_prio static, and rename refresh_all param
multipathd: don't activate socket activation by default
multipath: udev rules: use configured $(bindir) in udev rules
multipath-tools: Makefile.inc: set _FILE_OFFSET_BITS=64
GitHub workflows: update workflow events ()
GitHub workflows: update distros to run for ()
GitHub workflows: split multiarch workflow in rolling/stable ()
GitHub Workflows: run expensive workflows only on relevant changes ()
GitHub workflows: add spell checker ()
Spelling fixes found by check-spelling action ()
GitHub Workflows: abi: only run if C code has changed ()
GitHub workflows: abi: error if reference ABI can't be downloaded ()
GitHub workflows: run on PRs against "queue", too ()
multipath.conf.5: fix documentation for find_multipaths (fixes #75) ()
libmultipath: hwtable: fix fast_io_fail for Infinibox
11-dm-mpath.rules: fix list of imported properties
11-dm-mpath.rules: use import logic like 13-dm-disk.rules
11-dm-mpath.rules: don't import DM_UDEV_DISABLE_OTHER_RULES_FLAG
11-dm-mpath.rules: handle reloads during coldplug events
11-dm-mpath.rules: don't save DM_UDEV_DISABLE_OTHER_RULES_FLAG_OLD
11-dm-mpath.rules: clear DM_DISABLE_OTHER_RULES_FLAG for coldplug events
libmultipath: bump version to 0.9.8 ()
multipath-tools: added NEWS.md ()
@xosevp (3):
multipath-tools: update ml
multipath-tools: fix an assignment ambiguity
multipath-tools: remove extra hyphen from CFLAGS std option