-
Notifications
You must be signed in to change notification settings - Fork 929
Add headers pending flush to audit stats #2257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Looks broadly good. It got me wondering if, while we're touching this area, maybe we should do a little more refactor to ensure the time metric is reported on stream teardown (whatever the cause)? i.e. on destruction if the time is Some e.g. If so, we'd want to take avoid putting |
tokio-quiche/src/http3/stats.rs
Outdated
| /// exited. | ||
| /// | ||
| /// This count includes streams that were reset before headers were flushed. | ||
| headers_pending_flush: AtomicU64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can only ever be a maximum of 1? While blocked on HEADERS, no other frame will be attempted. If the frame succeeds, the count is decremented, before attempting any subsequent HEADERS. Maybe easier to convert it to a bool and let apps do counting across all streams based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to bool.
Reporting a duration on other teardown reasons seems fine to me, but I think we'ld like to also indicate if the flush succeeded or the stream was cancelled. We do not want to mix together the success and failure cases. Ultimately we want to detect cases where the connection has data to send but cannot make progress. The reasons why we may not be able to make progress include flow control, congestion control or other reasons. |
|
Right. The duration is cumulative across all HEADERS, while being actively blocked can happen only on an individual HEADERS. For example, 103 early hints, 200 OK, trailers. The audit stats already capture if the stream was stopped/reset/fin, apps can use that information in combination with your new metric introduced in the PR to figure out the terminal status. |
|
I think there's a difference between the case that the headers were sent after some delay and the case headers were never sent after a delay. Even if the client action is to reset the stream before FIN is sent. |
|
I agree, that's why I think it would be better to switch the metric proposed here to a bool that indicates if HEADERS were blocked, or not, at the time the stream is terminated. |
4bbe4a7 to
07104cf
Compare
Attempted to record blocked time in cases where the stream terminates. I'll look into some more testing on Monday. |
b875595 to
84430ca
Compare
7aef8a6 to
34db457
Compare
|
|
||
| let audit_stats = audit_stats.read().unwrap().clone().unwrap(); | ||
| assert!(audit_stats.headers_pending_flush()); | ||
| // Verify that headers_flush_duration is updated on connection drop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion in the next line is flaky according to CI; failed in mac and window builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a sleep doesn't seem to mitigate the flakiness. I need to debug further.
| .set_recvd_stream_fin(StreamClosureKind::Implicit); | ||
|
|
||
| // Update stats if there were pending header sends on this stream. | ||
| stream.full_headers_flush_aborted(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix for flaky test could be to move the contents of this function to IoWorker::close_connection and wait until metrics.connections_in_memory is 0 before making assertions in the test.
The request in the original test was never getting a response. After 2 seconds, the client used to close the connection due to the configured timeout.
121207b to
cee4429
Compare
No description provided.