-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock on I/O errors during device removal #17145
Conversation
spa_vdev_remove_thread() should not hold svr_lock while loading a metaslab. It may block ZIO threads, required to handle metaslab loading, at least in case of read errors causing recovery writes. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc.
Can we get one more reviewer on this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach seems right to me.
The only possible thing I can thing might go wrong is if two threads raced through here such that one loaded a range tree and swapped it into svr_allocd_segs
, and then the other did. I suspect that won't happen or won't be a problem for other reasons though, based on normal ZFS patterns and my own intuition:
- there's only one thread?
- the locking dance can't fall out this way
- its only ever going to swap the exact same contents in, so it's an effective no-op
Because I don't know this code at all and have not thought it all the way through, and because I am quite sure you know what you're doing here, take this as "well, I did at least think about it a little bit and its almost certainly fine" :)
Thanks!
@robn Thanks. Yes, AFAIK there can be only one thread. Plus the code asserts |
FreeBSD kernel's WITNESS code detected lock ordering violation in spa_vdev_remove_cancel_sync(). It took svr_lock while holding ms_lock, which is opposite to other places. I was thinking to resolve it similar to openzfs#17145, but looking closer I don't think we even need svr_lock at that point, since we already asserted svr_allocd_segs is empty, and we don't need to add there segments we are going to call free_mapped_segment_cb for. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc.
spa_vdev_remove_thread()
should not holdsvr_lock
while loading metaslabs. It may block ZIO threads, required to handle metaslab loading, at least in case of read errors causing recovery writes.How Has This Been Tested?
I hope it to fix deadlocks of
removal_with_errors
ZTS test that for some reason I observed plenty in the CI lately.Types of changes
Checklist:
Signed-off-by
.