-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for raid10 #1900
base: main
Are you sure you want to change the base?
Add support for raid10 #1900
Conversation
The original PR has enough context, #1549. The change LGTM. |
We should be able to pass anything that |
8c8ff63
to
ec5369d
Compare
Added a documentation update. A question regarding the CRD though: the validation changes, but the versioning isn't changed. Would that actually need |
We only bump the version on breaking changes, this is purely additive so we're fine to stay on v1alpha1. |
This removes the wait block for raid resync for two reasons: 1) raid0 does not have redundancy and therefore no initial resync[1] 2) with raid10 the resync time for 4x 1.9TB disks takes from tens of minutes to multiple hours, depending on sysctl params `dev.raid.speed_limit_min` and `dev.raid.speed_limit_max` and the speed of the disks. Initial resync for raid10 is not strictly needed[1] Filesystem creation: by default `mkfs.xfs` attempts to TRIM the drive. This is also something that can take tens of minutes or hours, depening on the size of drives. TRIM can be skipped, as instances are delivered with disks fully trimmed[2]. [1] https://raid.wiki.kernel.org/index.php/Initial_Array_Creation [2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html#InstanceStoreTrimSupport
ec5369d
to
caf3014
Compare
Description of changes:
I would like to be able to migrate workloads away from a node gracefully in case of instance storage drive failure. Raid10 would provide redundancy and trade off disk space.
Adding support for creating raid10 in addition to raid0. This also removes the wait block for raid resync for two reasons:
dev.raid.speed_limit_min
anddev.raid.speed_limit_max
and the speed of the disks. Initial resync for raid10 is not strictly needed[1]filesystem creation: by default
mkfs.xfs
attempts to TRIM the drive. This is also something that can take tens of minutes or hours, depening on the size of drives. TRIM can be skipped, as instances are delivered with disks fully trimmed[2].[1] https://raid.wiki.kernel.org/index.php/Initial_Array_Creation
[2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html#InstanceStoreTrimSupport
Testing Done
on
m6id.metal
with kernel defaults:With increased resync limits:
Due to some... accidents, this replaces the old pull request: #1549. There's further discussions about the details.