Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume failed to attach due to being deleted #1733

Open
veenadong opened this issue Sep 6, 2024 · 3 comments
Open

Volume failed to attach due to being deleted #1733

veenadong opened this issue Sep 6, 2024 · 3 comments
Assignees

Comments

@veenadong
Copy link

veenadong commented Sep 6, 2024

Mayastor 2.5.1: pods are in the Init state with: volume is being deleted error.
image (1)

Republish the volume resulted in MultiAttach error.

Attaching logs from setup 1:
mayastor-2024-09-03--15-57-59-UTC_unable_publish.tar.gz
mayastor-2024-09-03--19-23-29-UTC_unable_publish.tar.gz

Logs from setup 2:
mayastor-2024-09-06--01-47-16-UTC.tar.gz
In this setup, volume af2940b4-61fa-480e-9d5d-38651e1d01ce is in this state.

@tiagolobocastro
Copy link
Contributor

Please delete the io-engine pod for node glcr-hst01-sc3.cloud1.gl-hpe.net
Seems something is holding the nexus lock, which prevents its deletion

Then allow for some minutes and please take another bundle, thank you.

@veenadong
Copy link
Author

We restarted all the io-engine pods, and the setup managed to recover.

  1. How do we identify this problem? ie. any key words in the logs?
  2. What are the safe steps to restart the io-engine? (To prevent/reduce corruption to the filesystem/disk)

@tiagolobocastro
Copy link
Contributor

In this case it was these kind of logs:

 �[2m2024-09-03T19:23:45.388139Z�[0m �[31mERROR�[0m �[1;31mcore::volume::service�[0m�[31m: �[1;31merror�[0m�[31m: gRPC request 'destroy_nexus' for 'Nexus' failed with 'status: DeadlineExceeded, message: "Failed to acquire access to object within given timeout", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Tue, 03 Sep 2024 19:23:44 GMT", "content-length": "0"} }'�[0m

Although there was no data-plane logs for that time period, so not sure what was actually holding up the resource...

To safely restart the io-engine, make sure you don't delete with force but instead allow it to gracefully terminate.
In some cases you might want to drain the targets to another node using the plugin, ex: kubectl-mayastor drain node xxx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants