Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cancel netconn contexts after close handshake #494

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ethanndickson
Copy link
Member

This fixes a failed to get reader: failed to read frame header: EOF error we saw in coder/coder when upgrading to a version of the library where these contexts were cancelled during netConn.Close (>= v1.8.8).

This problem appears to happen when netConn.c.Close(...) is called after context cancellation causes nc.c.close() to be called (via Conn.timeoutLoop).

I really would've liked to fully understand why this problem was occurring on coder/coder, but following our usage of the library in the few testcases that this happens is incredibly difficult (as they're effectively fully end-to-end tests), and I was unable to produce an MRE for this bug despite my best efforts.

@mafredri
Copy link
Member

Thanks for the PR @ethanndickson!

I wanted to see if we're potentially causing a regression with this change and it looks like if read or write is open during Close, this change will block those.

Just some fairly simple tests:

	t.Run("netConn2", func(t *testing.T) {
		tt, c1, c2 := newConnTest(t, nil, nil)

		n1 := websocket.NetConn(tt.ctx, c1, websocket.MessageBinary)
		n2 := websocket.NetConn(tt.ctx, c2, websocket.MessageBinary)

		done := make(chan struct{})
		go func() {
			defer close(done)
			_, err := n1.Read(make([]byte, 1))
			assert.Equal(t, "read error", err, io.EOF)
		}()

		time.Sleep(time.Millisecond * 100)
		t.Log("closing n1")
		err := n1.Close()
		t.Log("closed n1")
		assert.Success(t, err)
		err = n2.Close()
		assert.Success(t, err)
		<-done
	})

	t.Run("netConn3", func(t *testing.T) {
		tt, c1, c2 := newConnTest(t, nil, nil)

		n1 := websocket.NetConn(tt.ctx, c1, websocket.MessageBinary)
		n2 := websocket.NetConn(tt.ctx, c2, websocket.MessageBinary)

		done := make(chan struct{})
		go func() {
			defer close(done)
			_, err := n1.Write(bytes.Repeat([]byte("hello"), 32*1024))
			assert.Equal(t, "write error", err, io.EOF)
		}()

		time.Sleep(time.Millisecond * 100)
		t.Log("closing n1")
		err := n1.Close()
		t.Log("closed n1")
		assert.Success(t, err)
		err = n2.Close()
		assert.Success(t, err)
		<-done
	})

Before:

❯ go test . -run=TestConn/netConn'[2-3]'
--- FAIL: TestConn (0.00s)
    --- FAIL: TestConn/netConn2 (0.10s)
        conn_test.go:199: closing n1
        conn_test.go:195: unexpected read error: expected &fmt.wrapError{msg:"failed to get reader: context canceled", err:(*errors.errorString)(0xc0000e62d0)} but got &errors.errorString{s:"EOF"}
        conn_test.go:201: closed n1
        conn_test.go:204: failed to close WebSocket: failed to write control frame opClose: failed to write frame: failed to flush: io: read/write on closed pipe
    --- FAIL: TestConn/netConn3 (0.10s)
        conn_test.go:222: closing n1
        conn_test.go:218: unexpected write error: expected &fmt.wrapError{msg:"failed to write msg: failed to write frame: context canceled", err:(*fmt.wrapError)(0xc0002a4060)} but got &errors.errorString{s:"EOF"}
        conn_test.go:224: closed n1
        conn_test.go:227: failed to close WebSocket: failed to write control frame opClose: failed to write frame: failed to flush: io: read/write on closed pipe
FAIL
FAIL	github.com/coder/websocket	0.107s
FAIL

After:

❯ go test . -run=TestConn/netConn'[2-3]'
--- FAIL: TestConn (0.00s)
    --- FAIL: TestConn/netConn3 (5.11s)
        conn_test.go:222: closing n1
        conn_test.go:218: unexpected write error: expected &fmt.wrapError{msg:"failed to write msg: failed to write frame: use of closed network connection", err:(*fmt.wrapError)(0xc0003b40a0)} but got &errors.errorString{s:"EOF"}
        conn_test.go:224: closed n1
        conn_test.go:225: failed to close WebSocket: failed to write control frame opClose: failed to acquire lock: context deadline exceeded
    --- FAIL: TestConn/netConn2 (5.11s)
        conn_test.go:199: closing n1
        conn_test.go:195: unexpected read error: expected &fmt.wrapError{msg:"failed to get reader: use of closed network connection", err:poll.errNetClosing{}} but got &errors.errorString{s:"EOF"}
        conn_test.go:201: closed n1
        conn_test.go:204: failed to close WebSocket: failed to write control frame opClose: failed to write frame: failed to flush: io: read/write on closed pipe
FAIL
FAIL	github.com/coder/websocket	5.111s
FAIL

Both tests then take 5s (i.e. hitting a context timeout). I don't have any great ideas for an easy fix off the top of my head 😕.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants