Skip to content

Document how to remove bad LINSTOR volume #369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Wescoeur
Copy link
Member

Before submitting the pull request, you must agree with the following statements by checking both boxes with a 'x'.

  • "I accept that my contribution is placed under the CC BY-SA 2.0 license [1]."
  • "My contribution complies with the Developer Certificate of Origin [2]."

[1] https://creativecommons.org/licenses/by-sa/2.0/
[2] https://docs.xcp-ng.org/project/contributing/#developer-certificate-of-origin-dco

@Wescoeur Wescoeur requested review from Nambrok and thomas-dkmt July 14, 2025 14:46
Wescoeur added 2 commits July 17, 2025 11:45
Use SP_NAME to prevent misunderstanding between:
- group name (linstor_group)
- and storage pool name (xcp-sr-linstor_group)

Signed-off-by: Ronan Abhamon <[email protected]>
@Wescoeur Wescoeur force-pushed the xostor-how-to-remove-bad-volume branch from 74e4609 to 686dff3 Compare July 17, 2025 09:55
Comment on lines +792 to +793
For example, a LINSTOR resource containing a VHD whose header/footer has been overwritten and is unreadable. Here, we're talking about a situation where a VHD isn't just simply corrupted and where `vhd-util repair -n <PATH>` is useless.
For example we can have this output in `SMlog`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we have two consecutive lines starting with 'For example', which seems confusing to me (also, one of them has a comma but not the other. We should be as consistent as possible).

Are they two different examples, or is the second line an example for the second sentence in the first line?

Jun 22 10:50:13 r620-s3 SM: [23871] FAILED in util.pread: (rc 22) stdout: 'error opening /dev/drbd/by-res/xcp-volume-83da35c4-dd18-47fb-9d2b-68bd5b92fcaa/0: -22
```

The problem with this error is that it's generic and can occur in other situations. If you're not sure what you're doing, contact us on support or the forum. Alternatively you can confirm that a resource is indeed unusable, execute this command by connecting to a host where the volume is marked `InUse`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't hurt to have clickable links for support and the forum, so that readers don't feel stuck should they need help

In this example, `<DRBD_PATH>` is `/dev/drbd/by-res/xcp-volume-83da35c4-dd18-47fb-9d2b-68bd5b92fcaa/0`
Two cases here:
- If the volume is marked as `InUse` in the LINSTOR database, run this command on the host that is using it.
- Otherwise you can run this same command on the master that has the resource path on its filesystem (so a DRBD diskless or diskful).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Otherwise you can run this same command on the master that has the resource path on its filesystem (so a DRBD diskless or diskful).
- Otherwise, you can run this same command on the master that has the resource path on its filesystem (so a DRBD diskless or diskful).

If the volume is unusable and prevents an SR PBD-plug command, any action, or cannot be deleted via xe or XO, you can follow the instructions below.

:::warning
Again, if you're unsure of the situation, the procedure below is risky. It's indeed very rare to need to read this section of the documentation. There is only one major case where we consider that's useful to run these commands: A program like `dd` was executed on a resource, which destroyed the VHD headers/footers. Another similar scenario is a deletion of the replicas followed by a recreation of the resources.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"It's indeed very rare to need to read this section of the documentation."

Is that sentence necessary? I feel like it can be deduced from the context

If the volume is unusable and prevents an SR PBD-plug command, any action, or cannot be deleted via xe or XO, you can follow the instructions below.

:::warning
Again, if you're unsure of the situation, the procedure below is risky. It's indeed very rare to need to read this section of the documentation. There is only one major case where we consider that's useful to run these commands: A program like `dd` was executed on a resource, which destroyed the VHD headers/footers. Another similar scenario is a deletion of the replicas followed by a recreation of the resources.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Again, if you're unsure of the situation, the procedure below is risky. It's indeed very rare to need to read this section of the documentation. There is only one major case where we consider that's useful to run these commands: A program like `dd` was executed on a resource, which destroyed the VHD headers/footers. Another similar scenario is a deletion of the replicas followed by a recreation of the resources.
Again, if you're unsure of the situation, the procedure below is risky. It's indeed very rare to need to read this section of the documentation. There is only one major case where we consider that's useful to run these commands: A program like `dd` was executed on a resource, which destroyed the VHD headers/footers. Another similar scenario is deleting the replicas then recreating the resources.

:::warning
Again, if you're unsure of the situation, the procedure below is risky. It's indeed very rare to need to read this section of the documentation. There is only one major case where we consider that's useful to run these commands: A program like `dd` was executed on a resource, which destroyed the VHD headers/footers. Another similar scenario is a deletion of the replicas followed by a recreation of the resources.

Additionally we assume that if you destroy a resource, you have a recent backup of the corresponding VM or VDI and you want to restore it if the data is important.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Additionally we assume that if you destroy a resource, you have a recent backup of the corresponding VM or VDI and you want to restore it if the data is important.
Additionally, we assume that if you destroy a resource, you have a recent backup of the corresponding VM or VDI and you want to restore it if the data is important.

╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```

2. If you don't know what is the corresponding VDI UUID for the DRBD resource, you can deduce it via this command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. If you don't know what is the corresponding VDI UUID for the DRBD resource, you can deduce it via this command:
2. If you don't know what the corresponding VDI UUID for the DRBD resource is, you can deduce it via this command:

As a reminder, `<RES_UUID>` is the UUID used in the naming of DRBD resources after the prefix `xcp-volume-`. For example: `xcp-volume-83da35c4-dd18-47fb-9d2b-68bd5b92fcaa`. And `<SP_NAME>` is the value obtained in the previous point.

:::tip
For more explanation between `RES_UUID` and `VDI_UUID` link you can read [this section](#map-linstor-resource-names-to-xapi-vdi-uuids):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For more explanation between `RES_UUID` and `VDI_UUID` link you can read [this section](#map-linstor-resource-names-to-xapi-vdi-uuids):
For more explanation between `RES_UUID` and `VDI_UUID` link, check out [this section](#map-linstor-resource-names-to-xapi-vdi-uuids):

linstor-kv-tool --dump-volumes -g xcp-sr-linstor_group_thin_device | grep volume-name | grep 83da35c4-dd18-47fb-9d2b-68bd5b92fcaa
"xcp/volume/6b9046a2-8ef9-47ef-baa9-a4c533ca848a/volume-name": "83da35c4-dd18-47fb-9d2b-68bd5b92fcaa",
```
Here the XAPI UUID of the VDI to delete is `6b9046a2-8ef9-47ef-baa9-a4c533ca848a`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Here the XAPI UUID of the VDI to delete is `6b9046a2-8ef9-47ef-baa9-a4c533ca848a`.
Here, the XAPI UUID of the VDI to delete is `6b9046a2-8ef9-47ef-baa9-a4c533ca848a`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants