CASMTRIAGE-6370 update csm.storage.smartmon to use specific ceph image when redeploying node-exporter #220

leliasen-hpe · 2023-12-04T19:43:01Z

Summary and Scope

Problem
On a storage node upgrade on Starlord, cephadm used the incorrect image when trying to redeploy node-exporter. This was likely due to the fact that midway through a storage node upgrade, the cephadm version installed on the node is at a higher version than the Ceph container version that is running. This causes cephadm to look for a default image at 'docker://quay.io/ceph/ceph:v17' which it is unable to find and caused the ansible play to fail.

Solution/Change
We can specify the container image that cephadm should use which means it should not look for this default image.

Notes
We were not able to prove why cephadm was looking for the ceph image at 'docker://quay.io/ceph/ceph:v17'. I have not replicated this problem and checked that this is indeed the solution. However, this is our best guess as to what is causing the problem.

Issues and Related PRs

List and characterize relationship to Jira/Github issues and other pull requests. Be sure to list dependencies.

Resolves CASMTRIAGE-6370

Testing

Tested on:

Beau, CSM 1.6 vShasta2

Test description:

I ran this ansible play on storage that contained the ceph admin keyring, which means this play would run all the way through. I also ran the play on storage nodes that did not have the ceph admin keyring which means this play would be skipped. Both tests worked. CFS logs are below for both cases.

Output when storage node had ceph admin keyring

TASK [csm.storage.smartmon : Get Ceph version] *********************************
changed: [x3000c0s30b0n0]

TASK [csm.storage.smartmon : Redeploy node-exporter] ***************************
changed: [x3000c0s30b0n0]

TASK [csm.storage.smartmon : Reconfig node-exporter] ***************************
changed: [x3000c0s30b0n0] => (item=reconfig node-exporter)
changed: [x3000c0s30b0n0] => (item=redeploy node-exporter)

PLAY RECAP *********************************************************************
x3000c0s30b0n0             : ok=23   changed=8    unreachable=0    failed=0    skipped=5    rescued=3    ignored=0

All playbooks completed successfully

Output when storage node did not have ceph admin keyring

PLAY [Management_Storage:!cfs_image] *******************************************

PLAY RECAP *********************************************************************
x3000c0s31b0n0             : ok=20   changed=5    unreachable=0    failed=0    skipped=8    rescued=3    ignored=0

All playbooks completed successfully

Risks and Mitigations

I did not get a chance to test this on metal to make sure it solves the problem. However, this will be tested on our internal CSM 1.4 to 1.5 upgrades.

Pull Request Checklist

Version number(s) incremented, if applicable
Copyrights updated
License file intact
Target branch correct
CHANGELOG.md updated
Testing is appropriate and complete, if applicable
HPC Product Announcement prepared, if applicable

…e when redeploying node-exporter

leliasen-hpe · 2023-12-05T18:23:52Z

/backport release/1.16

github-actions · 2023-12-05T18:24:09Z

Backporting into branch release/1.16 was successful. New PR: #221

ansible/roles/csm.storage.smartmon/tasks/main.yml

CHANGELOG.md

Update version link list

…0-1701884250 [chore] master -> develop from PR #220 (CASMTRIAGE-6370)

CASMTRIAGE-6370 update csm.storage.smartmon to use specific ceph imag…

ae7ff15

…e when redeploying node-exporter

leliasen-hpe requested a review from a team as a code owner December 4, 2023 19:43

cdelatte-hpe approved these changes Dec 4, 2023

View reviewed changes

Cray-HPE deleted a comment from github-actions bot Dec 5, 2023

github-actions bot mentioned this pull request Dec 5, 2023

[Backport release/1.16] CASMTRIAGE-6370 update csm.storage.smartmon to use specific ceph image when redeploying node-exporter #221

Merged

CASMTRIAGE-6370 update changelog

2d1907f

jsollom-hpe approved these changes Dec 5, 2023

View reviewed changes

ansible/roles/csm.storage.smartmon/tasks/main.yml Show resolved Hide resolved

mharding-hpe requested changes Dec 6, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

3f915f8

Update version link list

mharding-hpe approved these changes Dec 6, 2023

View reviewed changes

mharding-hpe enabled auto-merge December 6, 2023 17:37

mharding-hpe merged commit bfb062e into master Dec 6, 2023
8 checks passed

mharding-hpe deleted the CASMTRIAGE-6370 branch December 6, 2023 17:37

csm-gitflow-merge-back-bot bot mentioned this pull request Dec 6, 2023

[chore] master -> develop from PR Cray-HPE/csm-config#220 (CASMTRIAGE-6370) #222

Merged

mharding-hpe added a commit that referenced this pull request Dec 6, 2023

Merge pull request #222 from Cray-HPE/mergeback-master-CASMTRIAGE-637…

bbc597c

…0-1701884250 [chore] master -> develop from PR #220 (CASMTRIAGE-6370)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASMTRIAGE-6370 update csm.storage.smartmon to use specific ceph image when redeploying node-exporter #220

CASMTRIAGE-6370 update csm.storage.smartmon to use specific ceph image when redeploying node-exporter #220

leliasen-hpe commented Dec 4, 2023 •

edited

Loading

leliasen-hpe commented Dec 5, 2023

github-actions bot commented Dec 5, 2023

CASMTRIAGE-6370 update csm.storage.smartmon to use specific ceph image when redeploying node-exporter #220

CASMTRIAGE-6370 update csm.storage.smartmon to use specific ceph image when redeploying node-exporter #220

Conversation

leliasen-hpe commented Dec 4, 2023 • edited Loading

Summary and Scope

Issues and Related PRs

Testing

Tested on:

Test description:

Risks and Mitigations

Pull Request Checklist

leliasen-hpe commented Dec 5, 2023

github-actions bot commented Dec 5, 2023

leliasen-hpe commented Dec 4, 2023 •

edited

Loading