Skip to content

portfwd: create separate gRPC streams for each UDP client #3724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stek29
Copy link
Contributor

@stek29 stek29 commented Jul 14, 2025

The UDP port forwarder previously used a single gRPC stream for all clients, which could cause responses from the guest to be sent to the wrong client on the host.

This occurred because the stream was created before client connections were demultiplexed by gvisor-tap-vsock's UDPProxy.

The root cause is the interaction with gvisor-tap-vsock's UDPProxy, which handles client demultiplexing internally based on the source address of incoming datagrams. It expects its dialer function to return a new net.Conn for each new client it detects.

This commit moves the gRPC stream creation into the UDPProxy dialer function. This ensures a new, dedicated stream is created for each new client, fixing the incorrect response routing.

@stek29
Copy link
Contributor Author

stek29 commented Jul 14, 2025

easiest way to reproduce the issue is to run an actual long-running udp server with port-forwarding -- a DNS server, for example:

  1. start a lima vm with udp port forwarding configured, here's the config I've used https://gist.github.com/stek29/dcafea785fbde81537db84e8c19ed47f
  2. in the lima vm start a dns server, for example:
    docker run --rm -p 5053:53/udp -p 5053:53/tcp coredns/coredns:latest
    
  3. try to query the server from host:
    dig @127.0.0.1 -p 5053 example.com
    dig +tcp @127.0.0.1 -p 5053 example.com
    
  4. notice how for UDP only the first query succeeds, while TCP works fine

The UDP port forwarder previously used a single gRPC stream for all
clients, which could cause responses from the guest to be sent to the
wrong client on the host.

This occurred because the stream was created before client connections
were demultiplexed by `gvisor-tap-vsock`'s `UDPProxy`.

The root cause is the interaction with `gvisor-tap-vsock`'s `UDPProxy`,
which handles client demultiplexing internally based on the source
address of incoming datagrams. It expects its `dialer` function to
return a new `net.Conn` for each new client it detects.

This commit moves the gRPC stream creation into the `UDPProxy` dialer
function. This ensures a new, dedicated stream is created for each new
client, fixing the incorrect response routing.

Signed-off-by: Viktor Oreshkin <[email protected]>
@stek29
Copy link
Contributor Author

stek29 commented Jul 14, 2025

the cleanup is handled by gvisor-vtap-sock:

https://github.com/containers/gvisor-tap-vsock/blob/23f6f8364426a40294dc97da55865b30d4419819/pkg/services/forwarder/udp_proxy.go#L69-L101

it ends up calling .Close on the GrpcClientRW if there are no packets for UDPConnTrackTimeout = 90 * time.Second, which ends up calling g.stream.CloseSend() on the stream.

however, it is done by utilizing SetReadDeadline, which is just not implemented on GrpcClientRW:
https://github.com/stek29/lima/blob/a04f2447db7127203b64c6f92e608817d6d6f7ef/pkg/portfwd/client.go#L122-L132

so the streams and sockets will stay there forever until the socket on the vm is closed -- then they'll be collected and closed

@stek29
Copy link
Contributor Author

stek29 commented Jul 14, 2025

There are also no tests for UDP forwarding whatsoever, and that feels like it's too big to fix in this PR for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants