Skip to content

Conversation

@Fedosin
Copy link
Contributor

@Fedosin Fedosin commented Oct 17, 2025

Proposed Changes

This fix eliminates test flakiness by preventing traffic from going through the activator during graceful shutdown tests.

Previously, when requests hit the activator (especially with Kourier), deleting the Configuration removed the Revision, causing in-flight requests to fail with “revision not found.”

By setting minScale=1 and target-burst-capacity=0, the revision stays active and traffic goes directly to the pod, ensuring the test correctly validates graceful shutdown behavior regardless of deletion order.

@knative-prow knative-prow bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 17, 2025
@codecov
Copy link

codecov bot commented Oct 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.10%. Comparing base (cf48bab) to head (f7fd0b8).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16166      +/-   ##
==========================================
+ Coverage   80.06%   80.10%   +0.03%     
==========================================
  Files         214      214              
  Lines       16943    16940       -3     
==========================================
+ Hits        13565    13569       +4     
+ Misses       3017     3013       -4     
+ Partials      361      358       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nader-ziada
Copy link
Member

/lgtm

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2025
@dprotaso
Copy link
Member

This fix deletes Route before Configuration to prevent reconciliation errors during teardown (

It's not obvious to me why we want to do this. The original intent of the test was to ensure graceful shutdown of the pod and that we don't drop any traffic. In theory a user could delete a Route or Configuration in any order and we shouldn't drop traffic in either scenario.

Also testgrid shows that TestDestroyPodInflight is only flakey with Kourier which is interesting - unsure why.

Looking at those logs it seems like the requests are going through the activator and deleting the configuration affecting the revision state in the activator. Maybe that's the root cause of the failure?

This fix addresses the root cause of test flakiness by ensuring traffic
doesn't route through the activator during the graceful shutdown test.

The flakiness occurred when requests routed through the activator
(particularly with Kourier): deleting the Configuration triggered
Revision deletion, causing the activator's revision lookup to fail for
in-flight requests with 'revision not found' errors.

The fix sets minScale=1 and target-burst-capacity=0 to:
1. Keep the revision active (prevents scale-to-zero)
2. Route traffic directly to the pod (bypasses the activator)

This ensures the test properly validates graceful shutdown behavior
regardless of resource deletion order, which was the original intent.
@knative-prow knative-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 21, 2025
@knative-prow
Copy link

knative-prow bot commented Oct 21, 2025

New changes are detected. LGTM label has been removed.

@knative-prow
Copy link

knative-prow bot commented Oct 21, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Fedosin
Once this PR has been reviewed and has the lgtm label, please assign linkvt for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Fedosin
Copy link
Contributor Author

Fedosin commented Oct 21, 2025

The issue is that with Kourier, traffic goes through the activator, and when we delete the Configuration, the activator's revision lookup fails for in-flight requests.

I've updated the test to set minScale=1 and target-burst-capacity=0, which ensures the revision stays active and traffic routes directly to it, bypassing the activator. This way, the test properly validates graceful shutdown regardless of deletion order, which was the original intent.

@dprotaso
Copy link
Member

I guess it's not specific to kourier. https://prow.knative.dev/view/gs/knative-prow/pr-logs/pull/knative_serving/16197/gateway-api-latest_serving_main/1981379020142415872

It seems like we need two flavours of this test

  • one that goes through the activator
  • one that goes ingress => pod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants