Atomically send GOAWAY during HTTP/2 abort#65465
Atomically send GOAWAY during HTTP/2 abort#65465adityamandaleeka wants to merge 1 commit intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a race condition in HTTP/2 connection abort handling where a GOAWAY frame could be lost. The race occurred when ProcessRequestsAsync cleanup called _frameWriter.Complete() between the WriteGoAwayAsync and Abort() calls, causing the _completed flag to be set to true and the GOAWAY frame to be discarded.
Changes:
- Introduced
Http2FrameWriter.AbortWithGoAway()method to atomically write GOAWAY frame and abort connection under a single lock acquisition - Modified
Http2Connection.Abort()to use the new atomic method when closing the connection for the first time - Added detailed documentation explaining the race condition and the fix
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/Servers/Kestrel/Core/src/Internal/Http2/Http2FrameWriter.cs | Added new AbortWithGoAway() method that atomically writes GOAWAY frame and performs connection abort under single lock to prevent race condition |
| src/Servers/Kestrel/Core/src/Internal/Http2/Http2Connection.cs | Updated Abort() method to use new atomic AbortWithGoAway() when TryClose() succeeds, and regular Abort() otherwise |
|
ha, this ran into the flakniess fixed in #65456 |
1a4d821 to
a3aa414
Compare
|
Looks like this PR hasn't been active for some time and the codebase could have been changed in the meantime. |
Fix a race in
Http2Connection.AbortwhereProcessRequestsAsynccleanup could call_frameWriter.Complete()betweenWriteGoAwayAsyncandAbort(), causing GOAWAY to be dropped.Add
Http2FrameWriter.AbortWithGoAway()and use it fromHttp2Connection.Abort()to perform GOAWAY write/flush and abort under one lock acquisition.I'm intentionally not adding a regression test for this because the amount of timing nonsense you have to do to simulate it defeats the purpose of the test. I noticed this happening in CI though so hopefully we'll see less flakiness here.