Add headers pending flush to audit stats #2257

antoniovicente · 2025-11-22T03:00:19Z

No description provided.

LPardue · 2025-11-22T03:30:48Z

Looks broadly good. It got me wondering if, while we're touching this area, maybe we should do a little more refactor to ensure the time metric is reported on stream teardown (whatever the cause)? i.e. on destruction if the time is Some e.g.

                    if let Some(first) =
                        ctx.first_full_headers_flush_fail_time.take()
                    {
                        ctx.audit_stats.add_header_flush_duration(
                            Instant::now().duration_since(first),
                        );
                    }

If so, we'd want to take avoid putting self.headers_pending_flush.fetch_sub(1, Ordering::SeqCst); in add_header_flush_duration()

LPardue · 2025-11-22T03:35:24Z

tokio-quiche/src/http3/stats.rs

+    /// exited.
+    ///
+    /// This count includes streams that were reset before headers were flushed.
+    headers_pending_flush: AtomicU64,


I think this can only ever be a maximum of 1? While blocked on HEADERS, no other frame will be attempted. If the frame succeeds, the count is decremented, before attempting any subsequent HEADERS. Maybe easier to convert it to a bool and let apps do counting across all streams based on that.

Changed to bool.

antoniovicente · 2025-11-22T04:55:05Z

Looks broadly good. It got me wondering if, while we're touching this area, maybe we should do a little more refactor to ensure the time metric is reported on stream teardown (whatever the cause)? i.e. on destruction if the time is Some e.g.
                    if let Some(first) =
                        ctx.first_full_headers_flush_fail_time.take()
                    {
                        ctx.audit_stats.add_header_flush_duration(
                            Instant::now().duration_since(first),
                        );
                    }
If so, we'd want to take avoid putting self.headers_pending_flush.fetch_sub(1, Ordering::SeqCst); in add_header_flush_duration()

Reporting a duration on other teardown reasons seems fine to me, but I think we'ld like to also indicate if the flush succeeded or the stream was cancelled. We do not want to mix together the success and failure cases.

Ultimately we want to detect cases where the connection has data to send but cannot make progress. The reasons why we may not be able to make progress include flow control, congestion control or other reasons.

LPardue · 2025-11-22T05:57:23Z

Right. The duration is cumulative across all HEADERS, while being actively blocked can happen only on an individual HEADERS. For example, 103 early hints, 200 OK, trailers. The audit stats already capture if the stream was stopped/reset/fin, apps can use that information in combination with your new metric introduced in the PR to figure out the terminal status.

antoniovicente · 2025-11-22T18:57:54Z

I think there's a difference between the case that the headers were sent after some delay and the case headers were never sent after a delay. Even if the client action is to reset the stream before FIN is sent.

LPardue · 2025-11-22T19:44:36Z

I agree, that's why I think it would be better to switch the metric proposed here to a bool that indicates if HEADERS were blocked, or not, at the time the stream is terminated.

antoniovicente · 2025-11-22T22:03:36Z

Looks broadly good. It got me wondering if, while we're touching this area, maybe we should do a little more refactor to ensure the time metric is reported on stream teardown (whatever the cause)? i.e. on destruction if the time is Some e.g.
                    if let Some(first) =
                        ctx.first_full_headers_flush_fail_time.take()
                    {
                        ctx.audit_stats.add_header_flush_duration(
                            Instant::now().duration_since(first),
                        );
                    }
If so, we'd want to take avoid putting self.headers_pending_flush.fetch_sub(1, Ordering::SeqCst); in add_header_flush_duration()

Attempted to record blocked time in cases where the stream terminates. I'll look into some more testing on Monday.

tokio-quiche/src/http3/driver/mod.rs

tokio-quiche/src/http3/stats.rs

tokio-quiche/src/http3/driver/streams.rs

tokio-quiche/tests/integration_tests/headers.rs

…te successfully.

antoniovicente · 2025-12-15T23:31:11Z

tokio-quiche/tests/integration_tests/headers.rs

+
+    let audit_stats = audit_stats.read().unwrap().clone().unwrap();
+    assert!(audit_stats.headers_pending_flush());
+    // Verify that headers_flush_duration is updated on connection drop.


The assertion in the next line is flaky according to CI; failed in mac and window builds.

Adding a sleep doesn't seem to mitigate the flakiness. I need to debug further.

antoniovicente · 2025-12-16T00:50:04Z

tokio-quiche/src/http3/driver/mod.rs

                .set_recvd_stream_fin(StreamClosureKind::Implicit);
+
+            // Update stats if there were pending header sends on this stream.
+            stream.full_headers_flush_aborted();


The fix for flaky test could be to move the contents of this function to IoWorker::close_connection and wait until metrics.connections_in_memory is 0 before making assertions in the test.

The request in the original test was never getting a response. After 2 seconds, the client used to close the connection due to the configured timeout.

antoniovicente requested a review from a team as a code owner November 22, 2025 03:00

LPardue reviewed Nov 22, 2025

View reviewed changes

antoniovicente force-pushed the antonio/pending_headers branch 2 times, most recently from 4bbe4a7 to 07104cf Compare November 22, 2025 22:03

LPardue approved these changes Nov 24, 2025

View reviewed changes

antoniovicente force-pushed the antonio/pending_headers branch from b875595 to 84430ca Compare December 12, 2025 20:29

antoniovicente commented Dec 12, 2025

View reviewed changes

tokio-quiche/src/http3/driver/mod.rs Outdated Show resolved Hide resolved

antoniovicente mentioned this pull request Dec 12, 2025

tokio-quiche: always calculate a duration if headers were ever blocked #2259

Closed

antoniovicente marked this pull request as draft December 12, 2025 21:15

antoniovicente marked this pull request as ready for review December 15, 2025 16:49

toidiu reviewed Dec 15, 2025

View reviewed changes

tokio-quiche/src/http3/stats.rs Show resolved Hide resolved

tokio-quiche/src/http3/driver/streams.rs Outdated Show resolved Hide resolved

tokio-quiche/src/http3/driver/streams.rs Show resolved Hide resolved

tokio-quiche/tests/integration_tests/headers.rs Show resolved Hide resolved

toidiu approved these changes Dec 15, 2025

View reviewed changes

antoniovicente added 6 commits December 15, 2025 13:46

Add headers pending flush to audit stats

d9314dc

Switch to AtomicBool and decouple from add_header_flush_duration

2eb6acb

Also record duration in cases where the headers flush does not comple…

eb0b286

…te successfully.

Also update on connection drop

14cb8dd

Fix updates on drop and add basic test

046db79

review comment

34db457

antoniovicente force-pushed the antonio/pending_headers branch from 7aef8a6 to 34db457 Compare December 15, 2025 21:46

antoniovicente enabled auto-merge (squash) December 15, 2025 21:47

antoniovicente commented Dec 15, 2025

View reviewed changes

antoniovicente commented Dec 16, 2025

View reviewed changes

Speed up test and fix flakiness.

cee4429

The request in the original test was never getting a response. After 2 seconds, the client used to close the connection due to the configured timeout.

antoniovicente force-pushed the antonio/pending_headers branch from 121207b to cee4429 Compare December 17, 2025 01:18

fix build

ef0eb03

Add headers pending flush to audit stats #2257

Are you sure you want to change the base?

Add headers pending flush to audit stats #2257

Uh oh!

Conversation

antoniovicente commented Nov 22, 2025

Uh oh!

LPardue commented Nov 22, 2025

Uh oh!

LPardue Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

antoniovicente Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

antoniovicente commented Nov 22, 2025

Uh oh!

LPardue commented Nov 22, 2025

Uh oh!

antoniovicente commented Nov 22, 2025

Uh oh!

LPardue commented Nov 22, 2025

Uh oh!

antoniovicente commented Nov 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

antoniovicente Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

antoniovicente Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

antoniovicente Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants