eth: make transaction propagation paths in the network deterministic (#29034)

karalabe · minh-bq · commit efbc4f717a70 · 2024-09-16T17:13:42.000+07:00
commit ethereum/go-ethereum@0b1438c. Geth currently propagates new transactions by sending them in full to sqrt(peers) and announcing their hashes to the rest of the peers. The exceptions are peers that already are known to have the transactions (neither is done for them) and large/blob transactions (which are always announced). For this PR's scope, we don't care about the special cases, only the normal new transactions. The rationale behind the broadcast/announce split is that broadcasting to everyone in full would be very wasteful, as everyone would in essence receive the same transactions from all their peers. Announcing it to everyone on the other hand would minimize traffic, but would maximise distribution latency as everyone would need to explicitly ask for an announced transaction. Depending on whatever timeout clients would use, this could lead to multi-second delays in a single transaction's propagation time. Broadcasting to a few peers and announcing to everyone else ensures that the transaction ripples through well connected peers very fast and any degenerate part of the network is covered by announcements. The ideal ratio of the split between broadcast and announce is the topic of a different discussion. The interesting tidbit for this PR is that the split between broadcast and announce is currently done at random in Geth. We calculate that a new transaction needs to be sent in full to N peers out of M, and just pick them at random. This randomness is very much desired as it ensures that the network load caused by transactions is evenly distributed across all connections. As long as transactions are arriving at a steady rate from different accounts, this mechanism works well. It doesn't matter who sends what, we randomly pass it across the network and everyone will receive it one way or another. A problem arises however when there is a burst of transactions from the same account (whether insta-sending K transactions or individually in very quick succession). The problem is that evaluating whom to send in full and whom to announce by hash is evaluated randomly, independently across transactions. With K transactions arriving simultaneously from the same account, those would get randomly broadcast across our peer set. With a probability of 1, all peers will receive a sequence of nonce-gapped transactions, the gaps being announced only. This is a double issue: nodes will only forward executable transactions, so whenever a peer encounters a nonce gap, propagation will be choked from that point onward. Even though the gaps are announced, those will be received delayed (whether filled by someone else or needing explicit retrieval), time by which the gapped transactions might already be dropped. The issue is even worse for K transactions arriving individually in quick succession (say 50ms apart). There the exact same problem arises, but we can't even try to group transactions by account because we don't know what we've broadcast before and what future transactions will arrive. Tracking broadcast targets across time is a non trivial complexity. Geth's current solution to this problem is the transaction pool. In the "legacy" pool, we track two sets of transactions: the pending set, containing all the executable transactions (no nonce gaps) and the queued set, containing a mixed bag of everything that's missing a nonce. As time passes and gaps are filled in, we move queued transactions to pending transactions. Whilst in theory workable, in practice this constant shuffling makes the pool extremely brittle and easy to attack. The only way to simplify the pool and make it both more robust and possibly have a larger capacity is to somehow get rid of this 2 set complexity. For that to happen, we need to fix transaction propagation somehow to get rid of nonce gaps altogether. Whilst it might be "unfeasible" to make propagation 100% accurate and thus completely remove the pool's complexity; if we could make propagation almost-perfect, we could probably also very agressively simplify the txpool to only track a minimal subset of gaps for "flukes". Can we fix transaction propagation though? At least making it "approximately-correct". This PR is an attempt at saying Yes to that question. What we would like to achieve is to keep the current performance of transaction propagation (wrt bandwidth and latency), but avoid the nonce-gap-generation issue. The only way to do that is to ensure that if a tx is broadcast in full to a peer, all subsequent txs from the same account are broadcast in full. If on the other hand the tx is announced, all subsequent transactions are announced. The naive solution of tracking what we sent to who is a can of worms nobody wants to open (especially when we would like this mechanism to work across a longer time frame). The solution this PR proposes, is to "define" a "semi-stable" transaction broadcast/announce topology, where every node "knows" to whom they should broadcast and to whom they should announce, without having a complete view of the network or the transaction pool. It's ok if this "topology" is not completely stable, but it should be stable "enough" to capture semi-instantaneous bursts and keep then on the same propagation path wrt broadcasts/announce. Instead of picking sqrt peers at random to broadcast to; or instead of tracking to whom we've broadcasted before; the PR proposes to hash our own ID with a peer's ID and with the tx sender and use that checksum to select sqrt(peers) to broadcast to. The elegance of this algorithm is that as long as I have a relatively stable number of peers, the same peers will be selected over and over and over again for broadcasting, independent of what other peers are connected; and with exactly 0 state tracking. If enough peers join/leave to change the sqrt(peer) value, the topology will change, but apart from a startup wonkyness, the connections and pathways will be stable most of the time. The immediate upside is that nonce gaps should almost completely disappear (the more other clients also chose to implement this (or any other stable topology, doesn't have to be the same), the better the stability). With very minimised nonce gaps, we would be able to drastically simplify the txpool gapped tx handling since that would be the exception, not the general rule. Also, important to highlight, this change is essentially free from all perspectives: computationally 0, complexity wise 0, effort wise to add to Geth or any other client 0.
diff --git a/eth/backend.go b/eth/backend.go
@@ -278,6 +278,7 @@ func New(stack *node.Node, config *ethconfig.Config) (*Ethereum, error) {
 		}
 	}
 	if eth.handler, err = newHandler(&handlerConfig{
+		NodeID:               eth.p2pServer.Self().ID(),
 		Database:             chainDb,
 		Chain:                eth.blockchain,
 		TxPool:               eth.txPool,
diff --git a/eth/handler.go b/eth/handler.go
@@ -30,6 +30,7 @@ import (
 	"github.com/ethereum/go-ethereum/core/txpool"
 	"github.com/ethereum/go-ethereum/core/types"
 	"github.com/ethereum/go-ethereum/core/vote"
+	"github.com/ethereum/go-ethereum/crypto"
 	"github.com/ethereum/go-ethereum/eth/downloader"
 	"github.com/ethereum/go-ethereum/eth/fetcher"
 	"github.com/ethereum/go-ethereum/eth/protocols/eth"
@@ -40,8 +41,10 @@ import (
 	"github.com/ethereum/go-ethereum/log"
 	"github.com/ethereum/go-ethereum/metrics"
 	"github.com/ethereum/go-ethereum/p2p"
+	"github.com/ethereum/go-ethereum/p2p/enode"
 	"github.com/ethereum/go-ethereum/params"
 	"github.com/ethereum/go-ethereum/trie"
+	"golang.org/x/crypto/sha3"
 )
 
 const (
@@ -85,6 +88,7 @@ type txPool interface {
 // handlerConfig is the collection of initialization parameters to create a full
 // node network handler.
 type handlerConfig struct {
+	NodeID               enode.ID                  // P2P node ID used for tx propagation topology
 	Database             ethdb.Database            // Database for direct sync insertions
 	Chain                *core.BlockChain          // Blockchain to serve data from
 	TxPool               txPool                    // Transaction pool to propagate from
@@ -99,6 +103,7 @@ type handlerConfig struct {
 }
 
 type handler struct {
+	nodeID     enode.ID
 	networkID  uint64
 	forkFilter forkid.Filter // Fork ID filter, constant across the lifetime of the node
 
@@ -149,6 +154,7 @@ func newHandler(config *handlerConfig) (*handler, error) {
 		config.EventMux = new(event.TypeMux) // Nicety initialization for tests
 	}
 	h := &handler{
+		nodeID:               config.NodeID,
 		networkID:            config.Network,
 		forkFilter:           forkid.NewFilter(config.Chain),
 		eventMux:             config.EventMux,
@@ -587,25 +593,54 @@ func (h *handler) BroadcastTransactions(txs types.Transactions) {
 
 	)
 	// Broadcast transactions to a batch of peers not knowing about it
-	for _, tx := range txs {
-		peers := h.peers.peersWithoutTransaction(tx.Hash())
+	direct := big.NewInt(int64(math.Sqrt(float64(h.peers.len())))) // Approximate number of peers to broadcast to
+	if direct.BitLen() == 0 {
+		direct = big.NewInt(1)
+	}
+	total := new(big.Int).Exp(direct, big.NewInt(2), nil) // Stabilise total peer count a bit based on sqrt peers
 
-		var numDirect int
+	var (
+		signer = types.LatestSignerForChainID(h.chain.Config().ChainID) // Don't care about chain status, we just need *a* sender
+		hasher = sha3.NewLegacyKeccak256().(crypto.KeccakState)
+		hash   = make([]byte, 32)
+	)
+	for _, tx := range txs {
+		var maybeDirect bool
 		switch {
 		case tx.Type() == types.BlobTxType:
 			blobTxs++
 		case tx.Size() > txMaxBroadcastSize:
 			largeTxs++
 		default:
-			numDirect = int(math.Sqrt(float64(len(peers))))
+			maybeDirect = true
 		}
-		// Send the tx unconditionally to a subset of our peers
-		for _, peer := range peers[:numDirect] {
-			txset[peer] = append(txset[peer], tx.Hash())
-		}
-		// For the remaining peers, send announcement only
-		for _, peer := range peers[numDirect:] {
-			annos[peer] = append(annos[peer], tx.Hash())
+		// Send the transaction (if it's small enough) directly to a subset of
+		// the peers that have not received it yet, ensuring that the flow of
+		// transactions is groupped by account to (try and) avoid nonce gaps.
+		//
+		// To do this, we hash the local enode IW with together with a peer's
+		// enode ID together with the transaction sender and broadcast if
+		// `sha(self, peer, sender) mod peers < sqrt(peers)`.
+		for _, peer := range h.peers.peersWithoutTransaction(tx.Hash()) {
+			var broadcast bool
+			if maybeDirect {
+				hasher.Reset()
+				hasher.Write(h.nodeID.Bytes())
+				hasher.Write(peer.Node().ID().Bytes())
+
+				from, _ := types.Sender(signer, tx) // Ignore error, we only use the addr as a propagation target splitter
+				hasher.Write(from.Bytes())
+
+				hasher.Read(hash)
+				if new(big.Int).Mod(new(big.Int).SetBytes(hash), total).Cmp(direct) < 0 {
+					broadcast = true
+				}
+			}
+			if broadcast {
+				txset[peer] = append(txset[peer], tx.Hash())
+			} else {
+				annos[peer] = append(annos[peer], tx.Hash())
+			}
 		}
 	}
 	for peer, hashes := range txset {

Original file line number	Diff line number	Diff line change
`@@ -278,6 +278,7 @@ func New(stack node.Node, config ethconfig.Config) (*Ethereum, error) {`
`278`	`278`	`}`
`279`	`279`	`}`
`280`	`280`	`if eth.handler, err = newHandler(&handlerConfig{`
	`281`	`+ NodeID: eth.p2pServer.Self().ID(),`
`281`	`282`	`Database: chainDb,`
`282`	`283`	`Chain: eth.blockchain,`
`283`	`284`	`TxPool: eth.txPool,`