IPClaims missing for Machines #1354
Labels
kind/bug
Categorizes issue or PR as related to a bug.
needs-triage
Indicates an issue lacks a `triage/foo` label and requires one.
What steps did you take and what happened:
It sometimes happens that a Metal3Machine appears without IPClaim after a rolling upgrade of a MachineDeployment.
Update
I think we have pinpointed the issue!
Code path:
If the new Metal3Data is created before the old IPClaim, it could take the old IPClaim and render the secret before the old IPClaim is deleted.
Reproduction criteria:
<bmh-name>-<ippool-name>
.<m3d-name>-<ippool-name>
.End of update
Second update
After trying to find out why/if the IPClaim was not deleted before/together with the Metal3Data, I have concluded that the IPClaim should always be deleted before the finalizers are removed from the Metal3Data. So how come we see this issue still?
I think the answer is cache. Everything except BMHs, ConfigMaps and Secrets is cached, so we could simply get a cached IPClaim 🤦
End of second update
Unfortunately we have not been able to reproduce the issue yet.
Here is what we know so far:
templateReference
. The reference is unique per MachineDeployment and kept the same when rolling to a new Metal3DataTemplate.What did you expect to happen:
All Metal3Machines that are configured to use IPClaims should have the proper IPClaims created.
Otherwise there is risk of IP clash and all kinds of strange things.
Anything else you would like to add:
We are still investigating this issue and we are not 100 % sure that there is an issue in CAPM3, but it is our best lead so far.
I'm creating this issue now to keep track of findings and link to potentially relevant known bugs.
Environment:
kubectl version
): v1.28.3Discovered or related issues:
templateReference
is needed at all./kind bug
The text was updated successfully, but these errors were encountered: