Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address translation doesn't work correctly with non-reused ephemeral ports #5820

Open
dmitry-markin opened this issue Jan 17, 2025 · 3 comments

Comments

@dmitry-markin
Copy link
Contributor

dmitry-markin commented Jan 17, 2025

Summary

In libp2p-0.54.1 when handle_established_outbound_connection() of Identify protocol is called it seems port_use argument is not set to PortUse::New at least in some cases when ephemeral port was actually used.

This is what comment in ConnectedPoint::Dialer says:

/// The port use is implemented on a best-effort basis. It is not guaranteed
/// that [`PortUse::Reuse`] actually reused a port. A good example is the case
/// where there is no listener available to reuse a port from.

This leads to Identify mistakenly thinking that we reused the port, not taking this branch in emit_new_external_addr_candidate_event() and emitting such ephemeral address without translation in ToSwarm::NewExternalAddrCandidate(observed.clone()).

This issue was originally reported in paritytech/polkadot-sdk#7207. It leads to ephemeral addresses being discovered as external in polkadot. For the context, polkadot does not use AutoNAT protocol and eagerly confirms all external address candidates provided by Identify behavior.

To obtain the logs below additional logging was added to Identify: dmitry-markin@4205675

Expected behavior

Only listen addresses reported by Identify for connections with actually reused port are emitted without address translation.

Actual behavior

Ephemeral addresses with not reused ports are emitted without address translation.

Relevant log output

>>> skipping translation for port-reuse address "/ip4/79.142.83.14/tcp/47936"
🔍 Discovered new external address for our node: /ip4/79.142.83.14/tcp/47936/p2p/12D3KooW9tGYAQmDAVUxsHMK9Xur3bkyNNiy7Mmv8uUh7UCzRFw2
>>> skipping translation for port-reuse address "/ip4/79.142.83.14/tcp/40506"
🔍 Discovered new external address for our node: /ip4/79.142.83.14/tcp/40506/p2p/12D3KooW9tGYAQmDAVUxsHMK9Xur3bkyNNiy7Mmv8uUh7UCzRFw2
>>> skipping translation for port-reuse address "/ip4/79.142.83.14/tcp/45810"
🔍 Discovered new external address for our node: /ip4/79.142.83.14/tcp/45810/p2p/12D3KooW9tGYAQmDAVUxsHMK9Xur3bkyNNiy7Mmv8uUh7UCzRFw2
>>> skipping translation for port-reuse address "/ip4/79.142.83.14/tcp/36244"
🔍 Discovered new external address for our node: /ip4/79.142.83.14/tcp/36244/p2p/12D3KooW9tGYAQmDAVUxsHMK9Xur3bkyNNiy7Mmv8uUh7UCzRFw2
...
(and so on)

Possible Solution

  1. Update Transport::Dial to yield both Transport::Output and PortUse.
  2. Update all transports' dial() to return the actual PortUse of the outbound socket created.
  3. Update pool::task::PendingConnectionEvent::ConnectionEstablished to include PortUse; update pool::PoolEvent to yield correct PortUse in ConnectedPoint.
  4. Update all affected types. This includes transport composition types like AndThenFuture and the such.

Version

0.54.1

Would you like to work on fixing this bug?

Maybe

@dmitry-markin
Copy link
Contributor Author

dmitry-markin commented Jan 20, 2025

Additional context for why the port is not reused — this happens due to a secondary connection being established to the same host:port:

TRACE tokio-runtime-worker libp2p_tcp: Binding dial socket to listen socket address address=/ip4/64.62.224.10/tcp/30333/p2p/12D3KooWEjk6QXrZJ26fLpaajisJGHiz6WiQsR8k7mkM9GmWKnRZ
DEBUG tokio-runtime-worker libp2p_tcp: Failed to connect using existing socket because we already have a connection, re-dialing with new port connect_addr=64.62.224.10:30333 bind_addr=0.0.0.0:12345

This happens due to concurrent dial and some nodes providing multiple addresses (e.g. DNS + IP) that resolve to the same IP:port.

@dmitry-markin
Copy link
Contributor Author

This issue is also triggered without a secondary connection (concurrent dial disabled) to the same IP:port when listen address is not specified for some transport.

For example, if the node listens on a TCP address(e.g. /ip4/79.142.83.14/tcp/30333), but not on a WS address (e.g. /ip4/79.142.83.14/tcp/30334/ws), the TCP port reasonably won't be reused for WS remote addresses and ephemeral ports will be used instead.

@elenaf9
Copy link
Contributor

elenaf9 commented Feb 5, 2025

Thank you for the detailed bug report and for elaborating a possible solution! Sorry for the late reply.

Possible Solution

  1. Update Transport::Dial to yield both Transport::Output and PortUse.
  2. Update all transports' dial() to return the actual PortUse of the outbound socket created.
  3. Update pool::task::PendingConnectionEvent::ConnectionEstablished to include PortUse; update pool::PoolEvent to yield correct PortUse in ConnectedPoint.
  4. Update all affected types. This includes transport composition types like AndThenFuture and the such.

Yes that would solve it I think. However, it would require us to touch libp2p-core, all transport protocols, and the swarm. As you already noted in paritytech/polkadot-sdk#7207 (comment), it's quite a bit of effort.

I was wondering if we could instead solve the issue in the identify protocol.
From a practical perspective, we need to know if a used port was ephemeral or re-used because we need to know if one of our listening ports was reused (which then suggest that address + port of that listener is externally reachable), right?
We already track all current listening addresses in identify. Can't we just check in case of PortUse::Reuse that the port on that connection actually matches one of our current listening ports?

I am not super familiar with the port-reuse logic in different transports, so apologies if I am missing something very obvious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants