-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return existing reservation if podRef matches #296
base: master
Are you sure you want to change the base?
Return existing reservation if podRef matches #296
Conversation
cc1fa6a
to
31583e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good and thanks for the detailed breakdown for the PR description. Just have one question...also will try to bring this up to others for further review.
if reserved[i.String()] != "" { | ||
if reserved[i.String()] != podRef { | ||
continue | ||
} | ||
logging.Debugf("Found existing reservation %v with matching podRef %s", i.String(), podRef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the case where the reserved map is nonempty but doesn't contain the podRef? In other words, when will the "continue" get hit in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR. The diff makes sense to me. Could you please add unit-test in allocate_test.go for allocate.go changes, if you don't mind it?
Till fix is available, is there any workaround available to recover from this situation ? |
@xagent003 hey o/ This slipped me by; could you rebase ? Do you need help sorting out the unit tests ? |
if @xagent003 doesn't mind , i can open a new merge request with this fix, as we are running into this issue often while testing high availability and node crash scenarios. @maiqueb is that okay? |
Please do. We'll review it. Please take into account @s1061123 's request in #296 (review) Thanks for offering ! |
@dougbtv do you have any plan merging this PR because we are facing this issue with version 0.6.2 also and @xagent003 can you please confirm in which whereabouts version you made this changes and is it compatible with kubernetes 1.25 |
@maiqueb we are facing the issue that during the rolling upgrade of the kubernetes whenever a pods shift to a new worker node it got stuck the whereabout is not able to assign the ips to we tried the latest version 0.6.2 also but it didn't make any difference |
#291
As mentioned, the Pod has entered a state where the container has died/been killed. There was already an existing IP reservation for this Pod. But for whatever reason, the DEL is missed. But whereabouts should be smart enough to work around this and tie a reservation to podRef and NOT container id.
On restart/post upgrade of our stack, kubelet detects no sandbox container for Pod and tries to start it back up sending an ADD to cni/ipam. In the case of a non-full IP Pool, whereabouts will allocate a NEW IP for this Pod. Now IP Pool has 2 IPs allocated to same PodRef but different container IDs. So we leak IPs.
In the case of a full IP Pool, the ADD fails because no more IPs in range. However as mentioned... there is already an IP rez for same podRef. So why can't it just re-use it?
ip-reconciler does not help in this case as Pod is running and kubelet is stuck in a container retry loop. The graceful cleanup was missed due to container dying out of band and/or kubelet being down.
The only way to get out of this state is by manually deleting the IP reservation in both IPPool and overlappingranges, or, by this fix which returns an existing reservation if PodRef matches.