peer: Synchronize net.Conn with a mutex #3478

jrick · 2025-02-27T21:44:15Z

The access guarded by an atomic int32 was incorrect. For example, access to the p.conn could be performed as long as p.connected was non-zero, but p.connected would be incremented before p.conn was ever assigned by AssociateConnection.

The access guarded by an atomic int32 was incorrect. For example, access to the p.conn could be performed as long as p.connected was non-zero, but p.connected would be incremented before p.conn was ever assigned by AssociateConnection. While here, also add missing mutex locking protecting timeConnected and na.

davecgh

I don't have any issue with this being made harder to misuse, particularly because it's exported code, but I don't see any issue with how it is used currently in dcrd.

It is only ever initialized via inboundPeerConnected and outboundPeerConnected.

For inboundPeerConnected, it's given a net.Conn from the listener and it then calls peer.NewInboundPeer which only creates the struct, followed by the call to sp.AssociateConnection which sets the connected atomic and then assigns the conn.

Similarly, for outboundPeerConnected, it's given a *connmgr.ConnReq and net.Conn from the connection manager and it then calls peer.NewOutboundPeer which, similar to peer.NewInboundPeer, only creates the struct and does some other misc setup not involving the connection, followed by the call to sp.AssociateConnection which, the same as before, sets the connected atomic and then assigns the conn.

You are correct that there could theoretically be a race between setting the atomic and then assigning the conn if there were multiple goroutines racing on them, but no other goroutines aside from the one launched inside AssociateConnection itself have access to the peer until after AssocateConnection returns in either case, so there would be no race.

It is only after the initial fields are set in AssociateConnection that it launches the internal peer.start() in a goroutine and returns at which point the newly created and initialized peer is then further used in another goroutine running sp.Run() and finally returned to other code for use.

It's for the same reason the stats mutex is not really needed for the timeConnected and na fields. No other goroutines are racing on them at the point they are initialized.

jrick · 2025-03-04T14:07:20Z

That's right, it's because these are exported methods. I didn't see a race with how dcrd is using them.

Speaking of, I found it odd that since both inboundPeerConnected and outboundPeerConnected already have the connection available, why wouldn't peer.New{Inbound,Outbound}Peer require passing this to the constructor? It's only test code (banmanager) that ever creates (outbound) peers without an underlying connection.

jrick · 2025-03-04T14:14:39Z

And my initial concern that got me looking into this code is that peer.Disconnect, which is allowed to be called multiple times, would only actually close the underlying connection if it had already been associated. But if this order was called:

peer.Disconnect()
peer.AssociateConnection(conn)
peer.Disconnect()

then neither the AssociateConnection call, nor the final disconnect would close the underlying connection.

davecgh · 2025-03-04T21:42:09Z

Speaking of, I found it odd that since both inboundPeerConnected and outboundPeerConnected already have the connection available, why wouldn't peer.New{Inbound,Outbound}Peer require passing this to the constructor? It's only test code (banmanager) that ever creates (outbound) peers without an underlying connection.

Aside from requiring a new major module version since changing it would be a breaking API change, it's mostly just an artifact of much older code.

Originally, all of this code lived in the main server code as opposed to a separate package (along with all of the connection management and listening code too) and so it wasn't really written with the goal of being a separate package. The person who split it out pretty closely followed how things were already implemented as opposed to a more clean room package design, so there are certainly various aspects that could be improved.

Back at that time, the initial outbound connection was made and assigned by NewOutboundPeer itself and the version negotiation was handled in the main read and write code paths as opposed to the current cleaner separated method that was introduced by commit f3d759d (which itself could be further improved).

On the other hand, NewInboundPeer required a connection to be associated since the peer package doesn't handle listening and, due to a combination of factors that no longer apply (for example, the aforementioned version negotiation happening in the main read and write code paths), there were other things at the time that had to be done by the caller in between the creation of the peer instance and associating the connection with it. Hence, AssociateConnection was born.

Rather obviously, having NewOutboundPeer attempt to establish connections didn't play nicely with overall connection management, so that logic was pulled out and since AssociateConnection already existed, it was used instead of changing the API.

davecgh · 2025-03-04T21:58:43Z

And my initial concern that got me looking into this code is that peer.Disconnect, which is allowed to be called multiple times, would only actually close the underlying connection if it had already been associated. But if this order was called:
...
then neither the AssociateConnection call, nor the final disconnect would close the underlying connection.

Right. That doesn't actually happen with dcrd's usage, but since it's exported code, that sequence of events is indeed possible if a caller were to try to disconnect in between calling the constructor and associating the connection for some reason.

This PR accomplishes making it more robust against misuse without having to change the API and require a new major module version.

jrick force-pushed the conn_mutex branch from 00f2824 to 33c9cff Compare February 28, 2025 00:16

jrick mentioned this pull request Feb 28, 2025

peer: Do not send inventory before version #3479

Open

davecgh reviewed Mar 4, 2025

View reviewed changes

davecgh added this to the 2.1.0 milestone Mar 4, 2025

davecgh approved these changes Mar 4, 2025

View reviewed changes

davecgh merged commit 77c9cee into decred:master Mar 4, 2025
2 checks passed

jrick deleted the conn_mutex branch March 4, 2025 22:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

peer: Synchronize net.Conn with a mutex #3478

peer: Synchronize net.Conn with a mutex #3478

jrick commented Feb 27, 2025

davecgh left a comment •

edited

Loading

jrick commented Mar 4, 2025

jrick commented Mar 4, 2025

davecgh commented Mar 4, 2025

davecgh commented Mar 4, 2025

peer: Synchronize net.Conn with a mutex #3478

peer: Synchronize net.Conn with a mutex #3478

Conversation

jrick commented Feb 27, 2025

davecgh left a comment • edited Loading

Choose a reason for hiding this comment

jrick commented Mar 4, 2025

jrick commented Mar 4, 2025

davecgh commented Mar 4, 2025

davecgh commented Mar 4, 2025

davecgh left a comment •

edited

Loading