Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions close.go
Original file line number Diff line number Diff line change
Expand Up @@ -99,11 +99,7 @@ func CloseStatus(err error) StatusCode {
func (c *Conn) Close(code StatusCode, reason string) (err error) {
defer errd.Wrap(&err, "failed to close WebSocket")

if c.casClosing() {
err = c.waitGoroutines()
if err != nil {
return err
}
if c.userClosed.Swap(true) && c.isClosed() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 [CRF-3] No regression test for the race condition this PR fixes. The bug: when the internal read loop (read.go:361) or write path (write.go:365) wins casClosing() before the user calls Close, the old code returned a wrapped net.ErrClosed. No existing test exercises this interleaving.

The existing TestConcurrentClosePing exercises concurrent ping+close but not any close+close path. TestConnClosePropagation exercises close propagation but not the return value contract for the caller that loses the casClosing race. (Mafuuu)

I verified this by reverting the PR changes and running all close-related tests 10 times with the race detector on the old code. All passed. (Pariston)

A test that reproduces this: set up a connection with CloseRead, trigger the peer to send a close frame so the local read loop wins casClosing, then call Close from user code and assert nil return. (Bisky P1, Hisoka P2, Mafu-san P2, Mafuuu P2, Meruem P2, Chopper P2, Pariston P3, Leorio P3, Takumi P3)

🤖

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note [CRF-9] The userClosed.Swap(true) && c.isClosed() check is not atomic (the connection could finish closing between the swap and the channel probe), but the gap is benign. A second user call that falls through instead of returning net.ErrClosed immediately reaches casClosing() -> waitGoroutines(), the defer suppresses net.ErrClosed, and the caller gets nil. The system converges to the correct state on both sides of the gap. (Takumi)

🤖

return net.ErrClosed
}
defer func() {
Expand All @@ -112,6 +108,10 @@ func (c *Conn) Close(code StatusCode, reason string) (err error) {
}
}()

if c.casClosing() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-5] The PR description claims concurrent user calls "return instantly without blocking." In practice, a concurrent caller reaches casClosing() -> true, enters waitGoroutines(), and blocks on c.closeReadDone and c.closed with a 15-second timer. The blocking is correct behavior (caller gets confirmation that close completed), but the description overstates it. (Pariston, Mafuuu, Takumi)

🤖

return c.waitGoroutines()
}

err = c.closeHandshake(code, reason)

err2 := c.close()
Expand All @@ -132,11 +132,7 @@ func (c *Conn) Close(code StatusCode, reason string) (err error) {
func (c *Conn) CloseNow() (err error) {
defer errd.Wrap(&err, "failed to immediately close WebSocket")

if c.casClosing() {
err = c.waitGoroutines()
if err != nil {
return err
}
if c.userClosed.Swap(true) && c.isClosed() {
return net.ErrClosed
}
defer func() {
Expand All @@ -145,6 +141,10 @@ func (c *Conn) CloseNow() (err error) {
}
}()

if c.casClosing() {
return c.waitGoroutines()
}

err = c.close()

err2 := c.waitGoroutines()
Expand Down
7 changes: 4 additions & 3 deletions conn.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,10 @@ type Conn struct {
closeReadCtx context.Context
closeReadDone chan struct{}

closing atomic.Bool
closeMu sync.Mutex // Protects following.
closed chan struct{}
userClosed atomic.Bool // Set by Close/CloseNow on first user call.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [CRF-6] userClosed reads as completion alongside closed, but it tracks initiation. The struct now has three close fields in near-identical tenses (userClosed, closing, closed) where the first means "called" and the last means "done." Also, CloseRead's goroutine (read.go:82) calls the public c.Close(), setting userClosed from internal library code. closeRequested would disambiguate stage and scope without reading the body. (Gon P3, Hisoka Note, Chopper Note)

🤖

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-7] Comment "on first user call" restates what the name and type already show. The user is in the name, first call is derivable from atomic.Bool set once. Trim to // Set by Close/CloseNow. or // Tracks whether the user has called Close/CloseNow. (Gon, Leorio)

🤖

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note [CRF-8] Close state is now tracked across three independent boolean-ish variables (userClosed, closing, closed channel) forming an implicit state machine with ~5 reachable states and undocumented transitions. The current approach works correctly and is proportionate for a bugfix, but the next person who modifies close coordination must understand all three in combination. (Meruem)

🤖

closing atomic.Bool
closeMu sync.Mutex // Protects following.
closed chan struct{}

pingCounter atomic.Int64
activePingsMu sync.Mutex
Expand Down
Loading