Improve performance of UDPMux: Add BatchIO and multiple ports options #608

cnderrauber · 2023-08-29T09:34:14Z

Performance improvements:
Add BatchIO option to NewMultiUDPMuxPort(s), so the UDPMux can use batch write to improve write throughput.
Add multiple ports support to MultiUDPMuxDefault, this option will listen to multiple ports on a single interface for traffic load balance.

go.mod

codecov · 2023-08-29T09:37:04Z

Codecov Report

Patch coverage is 89.00% of modified lines.

Files Changed	Coverage
udp_muxed_conn.go	`70.00%`
udp_mux_multi.go	`90.69%`
udp_mux.go	`100.00%`

📢 Thoughts on this report? Let us know!.

boks1971

LGTM! Great work @cnderrauber !!!

davidzhao

Really cool to see this all come together. Nicely done!

just a few suggestions on comments

udp_mux_multi.go

stv0g

Hi @cnderrauber,

great to see progress in this direction :)

A few points:

This PR breaks the API of UDPMux. So we need to bump the major version of the module. Thats something I am open to, but I think we should batch this with other API-breaking changes (like #571).
Cant you simply use AllConnsGetter to get the count of connections?
I do not the the batched connection interface being used anywhere yet. Do you have or are planning to open a dedicated PR for this?
It would be good if we can also add those new features to the standard UDPMux. Or is there a reasoning behind adding this only to the MultiUDPMux?

Best regards,
Steffen

udp_mux.go

cnderrauber · 2023-08-31T14:10:24Z

Cant you simply use AllConnsGetter to get the count of connections?

I think it can't be used. The AllConnsGetter is implented by tcpmux and it also doesn't have a method to get the count of connections.

I do not the the batched connection interface being used anywhere yet. Do you have or are planning to open a dedicated PR for this?

I can't get the point of this question, the batched connection interface is in the pion/transport repo, what do you mean a dedicated PR?

It would be good if we can also add those new features to the standard UDPMux. Or is there a reasoning behind adding this only to the MultiUDPMux?

The UDPMux can have the batch feature simply by pass a PacketConn to its constructor NewUDPMuxDefault. And it should only listen on a single port so no need the ports balance option.

stv0g · 2023-08-31T15:11:09Z

I can't get the point of this question, the batched connection interface is in the pion/transport repo, what do you mean a dedicated PR?

Sorry. I dont see where the batched connection interface is used in pion/ice. I assume we need another PR to actually use it? Or did I missed something?

The UDPMux can have the batch feature simply by pass a PacketConn to its constructor NewUDPMuxDefault. And it should only listen on a single port so no need the ports balance option.

Alright. Thanks for the explanation.

udp_mux_multi.go

Improve performance of UDPMux by BatchIO and load balance on ports

Merge master branch

Fix panic lint

stv0g · 2023-09-01T06:13:00Z

udp_mux_multi_test.go

+			}()
+
+			// Skip IPv6 test on i386
+			const ptrSize = 32 << (^uintptr(0) >> 63)


Maybe we should check against runtime.GOARCH here?

I think they are identical, and if one day there is a new GOARCH, the test will be failed.

udp_muxed_conn.go

udp_mux.go

udp_mux_multi.go

Co-authored-by: Steffen Vogel <[email protected]>

stv0g · 2023-09-01T06:45:00Z

Hi @cnderrauber,

(This is a reply to #608 (comment), but I think its a more general concern which deserves its own comment.)

I am sorry to raise objections about this PR. But I am afraid that this PR introduces even more complexity into the UDPMux logic. And given the current state, we already have three different types for UDP muxing and we have not yet achieved some of the initial goals to allow single-port muxing for relayed candidates.

I was under the impression that the PR attempt to introduce a batched connection interface to ice.Conn and allow passing batches of packets all the way to the mux.

I am not really that happy with hiding the packet batching/queuing inside the mux.
This mixing concerns here. The mux should mux packets over a single port.
It should not doing port balancing and queuing in my opinion.

I cleaner implementation would expose a batched connection interface to the user of pion/ice (in the form of ice.Conn) so that you can implement the queuing in your application logic.

Because of the implementation inside the UDP muxing logic, the port balancing and queuing is also limited to configurations which use the mux. How about configuration without a UDPMux? Or relayed candidates?

cnderrauber · 2023-09-01T07:07:01Z

And given the current state, we already have three different types for UDP muxing and we have not yet achieved some of the initial goals to allow single-port muxing for relayed candidates.

That's true we already have three different types of UDPMux but actually they are all based on the UDPMuxDefault so they will all get benefits from the change. And I can't see why this will block other goals like relayed candidates.

I am not really that happy with hiding the packet batching/queuing inside the mux. This mixing concerns here. The mux should mux packets over a single port. It should not doing port balancing and queuing in my opinion.
I cleaner implementation would expose a batched connection interface to the user of pion/ice (in the form of ice.Conn) so that you can implement the queuing in your application logic.

Port balancing and Batch are two standalone options to improve the mux throughput that user can choose to enable anyone of them or both. A Mux only listens on a single port without batch will have a poor performance that can't be used in a large-scale production environment.
I don't think ice.Conn need the detail of batch as the batch only makes sense to the udp mux. Because in normal non-mux sessions, every single port only has limited traffic and packet rates that can't queue enough packet to write them in batch, otherwise, remote peer will get a very high jitter by the queueing.

Because of the implementation inside the UDP muxing logic, the port balancing and queuing is also limited to configurations which use the mux. How about configuration without a UDPMux? Or relayed candidates?

So port balance and batch should limit to the mux.

davidzhao · 2023-09-01T07:09:27Z

I am sorry to raise objections about this PR. But I am afraid that this PR introduces even more complexity into the UDPMux logic. And given the current state, we already have three different types for UDP muxing and we have not yet achieved some of the initial goals to allow single-port muxing for relayed candidates.

I was under the impression that the PR attempt to introduce a batched connection interface to ice.Conn and allow passing batches of packets all the way to the mux.

Don't be sorry! Appreciate you bringing up these points. It's worth discussing the tradeoffs that are being made. Let me share a bit more color here on the intent.

The goal of the PR is to improve performance of Pion when writing out to clients. In benchmarking, we discovered the bottleneck to be syscalls made during the writeTo path. In a typical case, the Pion-based server is sending packets simultaneously to multiple clients, so @cnderrauber came up with the proposal of batching packets intended for different clients together in the same syscall.

Benchmarks have shown that it's improved write performance by over 100%, compared to using standard pion/ice connections without batching. Also, by batching packets to multiple clients together, it helps to keep jitter under control, versus queuing sufficient packets before calling WriteBatch explicitly to each destination.

On the point of why it would use multiple ports. We've discovered additional bottlenecks writing out using a single UDP port. Locks within the write path would limit the number of cores we can effectively utilize. By allowing multiple ports to be used, we were able to get around some of these bottlenecks. Now, this is an option that isn't enabled by default. And for those that don't care about squeezing these bits of performance out of the system, the standard UDPMux is still available.

I hear you on the complexity it introduces in the code, though I hope what's in this PR is a reasonable set of tradeoffs that doesn't make the interface anymore difficult to use.

We are opening the PR here because we thought the work could benefit everyone using Pion. But we'd also respect your feedback if you feel that these changes should be kept in our own repo instead.

stv0g · 2023-09-01T10:34:11Z

Hi @davidzhao,

Thanks for the detailed description of the motivation behind this PR. I fully agree with you on these points. Batching can really help here to increase the throughput. I would also love to profit from this feature in my own app.

There is also a very good article by Tailscale where they describe how they have been able to accelerate the WireGuard Go implementation to achieve over 10Gbps throughput in Userspace by using batching: https://tailscale.com/blog/more-throughput/

I am only arguing about the API which we use to realize it. I would love to have a batching API also for ice.Conn. But it seems like you mainly profit by the fact the you accumulate all your connections behind the mux. The just passing through batches from ice.Conn to the batched mux connection would not really help?

Have you thought about abandoning the use of muxes? Using dedicated connections per ICE agent could help you with the load balancing as you are dealing with a separate per connection?

Regarding the port balancing: Have you considered using SO_REUSE_PORT?
See:

streamer45 · 2023-10-06T19:13:18Z

Apologies for the intrusion but port balancing on the same address/port through unix.SO_REUSEADDR is pretty much how we improved write throughput on our side as well. We implemented a multiConn wrapper implementing net.PacketConn that does some round robin on all available connections (e.g. one per CPU core). Batching sounds like a great option to add on top of that, looking forward to it.

kcaffrey · 2023-11-02T01:17:51Z

We are also load balancing on a single port using SO_REUSEADDR, probably in a very similar way that @streamer45 is. The wrapper still implements net.PacketConn, so we can plug it into the existing UDPMux without modifying UDPMux itself.

Having batching as well would be nice, but I echo the earlier comments regarding the API. It would be nice if the batching wasn't coupled so tightly to the "multi udp mux" variant, but rather something that played nicely with the default UDP mux too.

davidzhao · 2023-11-02T06:26:32Z

@kcaffrey @streamer45 any interest in adding port balancing into the standard UDPMux? It should be straight-forward to integrate with batching at that point.

It seems that the primary objection to the PR is the implication that batching will only be available to UDPMuxMulti and nothing else. That's not the case here. Most of the work had been put into transport/udp and can be leveraged by any of the muxes. To me it seems completely reasonable to have UDPMuxMulti to be the first consumer of this capability.

Am I misunderstanding the concerns?

AshishKumar4 · 2023-11-28T20:57:46Z

Hello! Just checking on the progress of this PR. We our building a stack (consisting of an SFU) based on Pion, and have encountered similar performance bottlenecks on the writeTo pathway and were actually considering forking and implementing batching ourselves before stumbling on this PR. We would really appreciate to know about the progress here as indeed we expect huge performance gains from this. Meanwhile we will port this code and use it as the udp muxer.And thanks for implementing this awesome PR

davidzhao · 2023-11-29T00:43:36Z

@AshishKumar4 the PR was ready to merge months ago, but several folks took issues with certain design decisions. We could merge as is, but in trying to be good citizens, we are holding off until @stv0g or others give the 👍. Until then, I'm afraid this PR is stuck here.

streamer45 · 2023-12-01T15:43:28Z

@kcaffrey @streamer45 any interest in adding port balancing into the standard UDPMux? It should be straight-forward to integrate with batching at that point.

@davidzhao Apologies for the late reply. Yeah that should be doable and likely to be more performant than having to go through the net.PacketConn interface from the application side. We can definitely plan a contribution if there's no immediate urgency.

It seems that the primary objection to the PR is the implication that batching will only be available to UDPMuxMulti and nothing else. That's not the case here. Most of the work had been put into transport/udp and can be leveraged by any of the muxes. To me it seems completely reasonable to have UDPMuxMulti to be the first consumer of this capability.

Am I misunderstanding the concerns?

For what it's worth, as I am a mere bystander here, I am supportive of these changes. I was purely adding a data point above.

Overall I think it's inherently challenging to achieve high performance without introducing some complexity somewhere and leaving it all on the higher layers to deal with (e.g. app side) is not always feasible. So I find giving the option to enable/disable these sort of enhancements at this level to be a very reasonable compromise.

JoeTurki · 2025-09-24T19:32:47Z

@cnderrauber @boks1971 do you guys still wanna merge this, it's been years, and I think the design issues can be addressed in later refactors or versions.
I can fix the conflicts.

kcaffrey · 2025-09-24T20:27:42Z

I have no major objections to merging as is. My earlier note was primarily intended to provide information on how we are using the API currently (that is, passing in a socket we create ourselves so that we can perform load balancing over a single port).

It looks like it would be possible to get batching (even without this PR) for any user who is passing a packet conn to UDPMux (rather than having one of the factory methods create and listen on the socket). I think it will be good to maintain this for any future improvements. For example, the batching strategy used by the implementation in pion/transport is not quite what we would want as it adds latency and jitter, so it is important to us that we can continue plugging in our own implementation of net.PacketConn (we would be fine implementing additional interfaces on the connection if needed).

JoeTurki · 2025-09-24T20:32:05Z

@kcaffrey Hello, good points, thank you.

I'm also debating if this should be added to pion/webrtc as a default mux behind a setting engine, what do you think? with the ability to disable batching or custom implementation.

kcaffrey · 2025-09-24T20:50:23Z

It makes sense to me to make it easy to enable batching from pion/webrtc via a setting engine option, but I don't think it should be the default.

The current implementation in pion/transport batches on a fixed interval, which adds half the batch interval on average to the latency. Even worse (for many use cases), it will increase the apparent jitter by an amount proportional to the batch interval (assuming that packet write attempts arrive randomly within the interval), which will in turn cause client-side jitter buffers to grow to compensate.

Back-pressure-triggered batching, where writes are queued but attempted immediately, and only batched when there are multiple queued writes available when starting a new syscall, would be a much safer default in my opinion. CPU savings will be lower or even negligible (especially if throughput is not bottlenecked), but overall throughput would be increased without adding any additional latency or jitter.

If we were to have any on-by-default batching, I think it should be the opportunistic/"only batch when blocked" style described above. There are certainly applications where the fixed-interval-batching is a better tradeoff (where end to end latency isn't that important but overall CPU usage IS important), but I imagine that would not be true for a majority of use cases.

boks1971 · 2025-09-25T05:32:57Z

@cnderrauber @boks1971 do you guys still wanna merge this, it's been years, and I think the design issues can be addressed in later refactors or versions. I can fix the conflicts.

yes, it will be great to merge this and using the settings engine to choose the behaviour.

cnderrauber · 2025-09-26T01:35:52Z

@cnderrauber @boks1971 do you guys still wanna merge this, it's been years, and I think the design issues can be addressed in later refactors or versions. I can fix the conflicts.

Will be good to have this in pion, it is useful in high throughput case that other pion users could get benefit from it.

cnderrauber requested review from Sean-Der, boks1971, davidzhao and stv0g August 29, 2023 09:34

cnderrauber commented Aug 29, 2023

View reviewed changes

go.mod Outdated Show resolved Hide resolved

cnderrauber force-pushed the batch_ports branch 4 times, most recently from 3338e2a to 9330afe Compare August 29, 2023 10:19

boks1971 approved these changes Aug 29, 2023

View reviewed changes

cnderrauber force-pushed the batch_ports branch from 9330afe to efe2551 Compare August 29, 2023 12:45

davidzhao approved these changes Aug 31, 2023

View reviewed changes

udp_mux_multi.go Outdated Show resolved Hide resolved

udp_mux_multi.go Outdated Show resolved Hide resolved

udp_mux_multi.go Outdated Show resolved Hide resolved

davidzhao reviewed Aug 31, 2023

View reviewed changes

udp_mux_multi.go Outdated Show resolved Hide resolved

cnderrauber force-pushed the batch_ports branch from d253938 to 0ad5b09 Compare August 31, 2023 05:30

stv0g requested changes Aug 31, 2023

View reviewed changes

udp_mux.go Show resolved Hide resolved

cnderrauber commented Sep 1, 2023

View reviewed changes

udp_mux_multi.go Show resolved Hide resolved

cnderrauber force-pushed the batch_ports branch from 0ad5b09 to f7b9660 Compare September 1, 2023 01:26

Add BatchIO and load balance on ports to UDPMux

280be41

Improve performance of UDPMux by BatchIO and load balance on ports

cnderrauber force-pushed the batch_ports branch from f7b9660 to 280be41 Compare September 1, 2023 02:10

cnderrauber added 2 commits September 1, 2023 10:14

Merge master branch

0a385a4

Merge master branch

Fix panic lint

6416274

Fix panic lint

cnderrauber force-pushed the batch_ports branch from 9ca1867 to 6416274 Compare September 1, 2023 02:32

cnderrauber requested a review from stv0g September 1, 2023 06:19

stv0g requested changes Sep 1, 2023

View reviewed changes

cnderrauber and others added 3 commits September 1, 2023 14:27

Update udp_muxed_conn.go

62fae8f

Co-authored-by: Steffen Vogel <[email protected]>

Update udp_mux.go

29ebc59

Co-authored-by: Steffen Vogel <[email protected]>

Update udp_mux_multi.go

b87f577

Co-authored-by: Steffen Vogel <[email protected]>

Update udp_mux_multi.go

6ed5bd7

Co-authored-by: Steffen Vogel <[email protected]>

JoeTurki linked an issue Sep 24, 2025 that may be closed by this pull request

Reduce candidateBase.writeTo CPU cost #128

Open

Improve performance of UDPMux: Add BatchIO and multiple ports options #608

Are you sure you want to change the base?

Improve performance of UDPMux: Add BatchIO and multiple ports options #608

Uh oh!

Conversation

cnderrauber commented Aug 29, 2023

Uh oh!

Uh oh!

codecov bot commented Aug 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

boks1971 left a comment

Choose a reason for hiding this comment

Uh oh!

davidzhao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stv0g left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cnderrauber commented Aug 31, 2023

Uh oh!

stv0g commented Aug 31, 2023

Uh oh!

Uh oh!

stv0g Sep 1, 2023

Choose a reason for hiding this comment

Uh oh!

cnderrauber Sep 1, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stv0g commented Sep 1, 2023

Uh oh!

cnderrauber commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidzhao commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stv0g commented Sep 1, 2023

Uh oh!

streamer45 commented Oct 6, 2023

Uh oh!

kcaffrey commented Nov 2, 2023

Uh oh!

davidzhao commented Nov 2, 2023

Uh oh!

AshishKumar4 commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidzhao commented Nov 29, 2023

Uh oh!

streamer45 commented Dec 1, 2023

Uh oh!

JoeTurki commented Sep 24, 2025

Uh oh!

kcaffrey commented Sep 24, 2025

Uh oh!

JoeTurki commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kcaffrey commented Sep 24, 2025

Uh oh!

boks1971 commented Sep 25, 2025

Uh oh!

cnderrauber commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

codecov bot commented Aug 29, 2023 •

edited

Loading

cnderrauber commented Sep 1, 2023 •

edited

Loading

davidzhao commented Sep 1, 2023 •

edited

Loading

AshishKumar4 commented Nov 28, 2023 •

edited

Loading

JoeTurki commented Sep 24, 2025 •

edited

Loading