-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix flaky TestDestroyPodInflight test #16166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #16166 +/- ##
==========================================
+ Coverage 80.06% 80.10% +0.03%
==========================================
Files 214 214
Lines 16943 16940 -3
==========================================
+ Hits 13565 13569 +4
+ Misses 3017 3013 -4
+ Partials 361 358 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4a86c90
to
a873d9c
Compare
/lgtm |
It's not obvious to me why we want to do this. The original intent of the test was to ensure graceful shutdown of the pod and that we don't drop any traffic. In theory a user could delete a Route or Configuration in any order and we shouldn't drop traffic in either scenario. Also testgrid shows that Looking at those logs it seems like the requests are going through the activator and deleting the configuration affecting the revision state in the activator. Maybe that's the root cause of the failure? |
This fix addresses the root cause of test flakiness by ensuring traffic doesn't route through the activator during the graceful shutdown test. The flakiness occurred when requests routed through the activator (particularly with Kourier): deleting the Configuration triggered Revision deletion, causing the activator's revision lookup to fail for in-flight requests with 'revision not found' errors. The fix sets minScale=1 and target-burst-capacity=0 to: 1. Keep the revision active (prevents scale-to-zero) 2. Route traffic directly to the pod (bypasses the activator) This ensures the test properly validates graceful shutdown behavior regardless of resource deletion order, which was the original intent.
a873d9c
to
f7fd0b8
Compare
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Fedosin The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The issue is that with Kourier, traffic goes through the activator, and when we delete the Configuration, the activator's revision lookup fails for in-flight requests. I've updated the test to set |
I guess it's not specific to kourier. https://prow.knative.dev/view/gs/knative-prow/pr-logs/pull/knative_serving/16197/gateway-api-latest_serving_main/1981379020142415872 It seems like we need two flavours of this test
|
Proposed Changes
This fix eliminates test flakiness by preventing traffic from going through the activator during graceful shutdown tests.
Previously, when requests hit the activator (especially with Kourier), deleting the Configuration removed the Revision, causing in-flight requests to fail with “revision not found.”
By setting
minScale=1
andtarget-burst-capacity=0
, the revision stays active and traffic goes directly to the pod, ensuring the test correctly validates graceful shutdown behavior regardless of deletion order.