Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pantry region replacement can only handle a single failed region #1593

Closed
leftwo opened this issue Jan 7, 2025 · 3 comments
Closed

Pantry region replacement can only handle a single failed region #1593

leftwo opened this issue Jan 7, 2025 · 3 comments
Assignees

Comments

@leftwo
Copy link
Contributor

leftwo commented Jan 7, 2025

When we have a disk that contains a downstairs that needs repair, and that disk is not attached to a running propolis, we will spin up an upstairs off the pantry and send that upstairs the VCR with the "fix" in it. We then rely on the upstairs to repair the new downstairs during the initial reconciliation process that happens on activation at startup.

This process is fine when there is just one bad downstairs in a VCR, and once the pantry upstairs has completed activation, we can tell nexus that all is good and the disk is repaired.

However, we don't have a simple disk in this case. We have a VCR with a RW sub-volume, and a tree of read only parents.

Consider this disk:

root@oxz_switch1:~# omdb db disks info bef4889d-e590-458e-81ec-b997e18f3760
HOST_SERIAL DISK_NAME INSTANCE_NAME PROPOLIS_ZONE VOLUME_ID                            DISK_STATE
-           disk-20   -             -             c9fd9286-4528-4212-be60-070178b3e093 detached
HOST_SERIAL REGION                               ZONE                                              PHYSICAL_DISK
BRM42220062 f153ee6c-910d-4628-8350-ba9022bb5158 oxz_crucible_31942354-90cc-4f03-86c8-31b5c8ab33a7 591b6235-01c7-41bc-ba37-03d3c1b1e9dc
BRM42220030 452b7095-312e-4d1a-9285-4ef344d09df4 oxz_crucible_3b82acd4-61e0-421b-921d-44b042254c69 544ed702-38fe-4314-b7c5-8de94cc19304
BRM42220030 ccc72284-7f4e-4ae6-acd3-3edbef635fff oxz_crucible_ddb565c5-7cdc-406c-9abc-b396bb5effd3 e2f72669-bb24-4f3d-bc1a-cebc7f9d2aad

VCR from volume ID c9fd9286-4528-4212-be60-070178b3e093
ID                                   BS  SUB_VOLUMES READ_ONLY_PARENT
bef4889d-e590-458e-81ec-b997e18f3760 512 1           true

SUB VOLUME 0
    ID                                   BS  BPE    EC   GEN READ_ONLY
    bef4889d-e590-458e-81ec-b997e18f3760 512 131072 3200 5   false
    [fd00:1122:3344:102::e]:19007
    [fd00:1122:3344:102::8]:19006
    [fd00:1122:3344:103::7]:19003

READ ONLY PARENT:
    ID                                   BS  SUB_VOLUMES READ_ONLY_PARENT
    fef63546-9a7b-426b-ab2c-e72f6a3283ca 512 1           true

    SUB VOLUME 0
        ID                                   BS  BPE    EC   GEN READ_ONLY
        fd7b68c1-268f-4510-8d28-def8c6fddc95 512 131072 3200 2   true
        [fd00:1122:3344:101::b]:19005
        [fd00:1122:3344:102::d]:19005
        [fd00:1122:3344:121::24]:19008
        
    READ ONLY PARENT:
        ID                                   BS  SUB_VOLUMES READ_ONLY_PARENT
        67fe02af-af39-413e-9c60-d6d3af08d66d 512 1           true
        
        SUB VOLUME 0
            ID                                   BS  BPE    EC   GEN READ_ONLY
            39bbc2e1-f5eb-440d-9815-962fafe1fabf 512 131072 3200 2   true
            [fd00:1122:3344:121::28]:19005
            [fd00:1122:3344:101::a]:19012
            [fd00:1122:3344:102::d]:19003

How the pantry (or propolis) works with multi level VCR is that it spins up an upstairs instance for each sub-volume (of which there is currently only ever one) and one upstairs for each level of the read only parent. In order for the pantry to consider a repair completed, all the upstairs instances have to make it through activation. In the VCR received by the pantry, the pantry does not know which sub-volume or read only parent level is the one that we want to repair, so it activates (or tries to activate) all of them.

And, here is the problem. While the upstairs instance that needed to do a repair will do so and activate, there are other parts of this VCR that also need to be repaired, (in the example above, any layer that has a fd00:1122:3344:121: address is on the expunged sled) and these instances will not be able to activate as they cannot contact the expunged sled.

This also explains the dtrace output that found an upstairs instance with two WQ and one NEW:

oxz_crucible_pantry_47fa71bb 23599 2e3546f5 NEW  WQ  WQ

This would be an upstairs instance that was still trying to contact a downstairs on an expunged sled.

Originally posted by @leftwo in #1591

@leftwo
Copy link
Contributor Author

leftwo commented Jan 10, 2025

A possible fix for this could be allowing read only upstairs to activate with < 3 downstairs present.
That would at least let a VCR that had both a read only parent and a sub-volume that needed repair to make progress when attached to the pantry if the repair was for the sub-volume.

It still does not solve what to do with a VCR that has multiple sub volume needing repair (of which we don't currently have).
It also would require that we repair all sub-volumes first everywhere, before we try to repair read only parents, as sub-volumes require all three present to reconcile and become active.

@leftwo
Copy link
Contributor Author

leftwo commented Jan 10, 2025

RO activation will less than three issue: #1599

@leftwo leftwo self-assigned this Jan 14, 2025
leftwo added a commit that referenced this issue Jan 24, 2025
Allow a read only upstairs to activate with one a single downstairs 
present.

In upstairs/src/upstairs.rs, I've added a check when a downstairs
transitions to `WaitQuorum`.  If we are read-only, then we can skip
reconciliation and activate the upstairs.  If we are already active 
(and read only), then a new downstairs can go to active.

Added some tests and a bit of additional test framework to verify
an upstairs can activate with only a single downstairs ready.

This "fixes" the feature request in
#1599
and may help with #1593
@leftwo
Copy link
Contributor Author

leftwo commented Jan 28, 2025

With #1608 fix in, and no support yet for
multiple sub-volumes, a RW pantry region replacement can now make progress even if
it has read-only-parents that are missing, as those will all activate.

For support of multiple sub-volumes, I'll create another issue for that as we have not
written the code yet for it, so it does not make any sense to have an issue open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant