Skip to content

Commit 7987d4d

Browse files
behlendorftonyhutter
authored andcommitted
Update device removal documentation
Make a minor update to the 'zpool remove' man page to clarify both raidz and draid pools do not support removal, and change sector to ashift which is what we actually care about. Update the big theory comment in vdev_removal.c to accurately reflect which types of vdevs can be removed. Furthermore, I've added some discussion for the casual reader to briefly explain the top-level vdev removal restrictions. This has been a common area of confusion and it's not intuitive where they come from without understanding the implementation details. Signed-off-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Closes openzfs#17847
1 parent 1956417 commit 7987d4d

File tree

2 files changed

+60
-24
lines changed

2 files changed

+60
-24
lines changed

man/man8/zpool-remove.8

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,8 @@ This command supports removing hot spare, cache, log, and both mirrored and
5858
non-redundant primary top-level vdevs, including dedup and special vdevs.
5959
.Pp
6060
Top-level vdevs can only be removed if the primary pool storage does not contain
61-
a top-level raidz vdev, all top-level vdevs have the same sector size, and the
62-
keys for all encrypted datasets are loaded.
61+
a top-level raidz or draid vdev, all top-level vdevs have the same ashift size,
62+
and the keys for all encrypted datasets are loaded.
6363
.Pp
6464
Removing a top-level vdev reduces the total amount of space in the storage pool.
6565
The specified device will be evacuated by copying all allocated space from it to

module/zfs/vdev_removal.c

Lines changed: 58 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -51,34 +51,70 @@
5151
#include <sys/trace_zfs.h>
5252

5353
/*
54-
* This file contains the necessary logic to remove vdevs from a
55-
* storage pool. Currently, the only devices that can be removed
56-
* are log, cache, and spare devices; and top level vdevs from a pool
57-
* w/o raidz or mirrors. (Note that members of a mirror can be removed
58-
* by the detach operation.)
54+
* This file contains the necessary logic to remove vdevs from a storage
55+
* pool. Note that members of a mirror can be removed by the detach
56+
* operation. Currently, the only devices that can be removed are:
5957
*
60-
* Log vdevs are removed by evacuating them and then turning the vdev
61-
* into a hole vdev while holding spa config locks.
58+
* 1) Traditional hot spare and cache vdevs. Note that draid distributed
59+
* spares are fixed at creation time and cannot be removed.
6260
*
63-
* Top level vdevs are removed and converted into an indirect vdev via
64-
* a multi-step process:
61+
* 2) Log vdevs are removed by evacuating them and then turning the vdev
62+
* into a hole vdev while holding spa config locks.
6563
*
66-
* - Disable allocations from this device (spa_vdev_remove_top).
64+
* 3) Top-level singleton and mirror vdevs, including dedup and special
65+
* vdevs, are removed and converted into an indirect vdev via a
66+
* multi-step process:
6767
*
68-
* - From a new thread (spa_vdev_remove_thread), copy data from
69-
* the removing vdev to a different vdev. The copy happens in open
70-
* context (spa_vdev_copy_impl) and issues a sync task
71-
* (vdev_mapping_sync) so the sync thread can update the partial
72-
* indirect mappings in core and on disk.
68+
* - Disable allocations from this device (spa_vdev_remove_top).
7369
*
74-
* - If a free happens during a removal, it is freed from the
75-
* removing vdev, and if it has already been copied, from the new
76-
* location as well (free_from_removing_vdev).
70+
* - From a new thread (spa_vdev_remove_thread), copy data from the
71+
* removing vdev to a different vdev. The copy happens in open context
72+
* (spa_vdev_copy_impl) and issues a sync task (vdev_mapping_sync) so
73+
* the sync thread can update the partial indirect mappings in core
74+
* and on disk.
7775
*
78-
* - After the removal is completed, the copy thread converts the vdev
79-
* into an indirect vdev (vdev_remove_complete) before instructing
80-
* the sync thread to destroy the space maps and finish the removal
81-
* (spa_finish_removal).
76+
* - If a free happens during a removal, it is freed from the removing
77+
* vdev, and if it has already been copied, from the new location as
78+
* well (free_from_removing_vdev).
79+
*
80+
* - After the removal is completed, the copy thread converts the vdev
81+
* into an indirect vdev (vdev_remove_complete) before instructing
82+
* the sync thread to destroy the space maps and finish the removal
83+
* (spa_finish_removal).
84+
*
85+
* The following constraints currently apply primary device removal:
86+
*
87+
* - All vdevs must be online, healthy, and not be missing any data
88+
* according to the DTLs.
89+
*
90+
* - When removing a singleton or mirror vdev, regardless of it's a
91+
* special, dedup, or primary device, it must have the same ashift
92+
* as the devices in the normal allocation class. Furthermore, all
93+
* vdevs in the normal allocation class must have the same ashift to
94+
* ensure the new allocations never includes additional padding.
95+
*
96+
* - The normal allocation class cannot contain any raidz or draid
97+
* top-level vdevs since segments are copied without regard for block
98+
* boundaries. This makes it impossible to calculate the required
99+
* parity columns when using these vdev types as the destination.
100+
*
101+
* - The encryption keys must be loaded so the ZIL logs can be reset
102+
* in order to prevent writing to the device being removed.
103+
*
104+
* N.B. ashift and raidz/draid constraints for primary top-level device
105+
* removal could be slightly relaxed if it were possible to request that
106+
* DVAs from a mirror or singleton in the specified allocation class be
107+
* used (metaslab_alloc_dva).
108+
*
109+
* This flexibility would be particularly useful for raidz/draid pools which
110+
* often include a mirrored special device. If a mistakenly added top-level
111+
* singleton were added it could then still be removed at the cost of some
112+
* special device capacity. This may be a worthwhile tradeoff depending on
113+
* the pool capacity and expense (cost, complexity, time) of creating a new
114+
* pool and copying all of the data to correct the configuration.
115+
*
116+
* Furthermore, while not currently supported it should be possible to allow
117+
* vdevs of any type to be removed as long as they've never been written to.
82118
*/
83119

84120
typedef struct vdev_copy_arg {

0 commit comments

Comments
 (0)