-
Notifications
You must be signed in to change notification settings - Fork 184
Improve performance of UDPMux: Add BatchIO and multiple ports options #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportPatch coverage is
📢 Thoughts on this report? Let us know!. |
3338e2a
to
9330afe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great work @cnderrauber !!!
9330afe
to
efe2551
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool to see this all come together. Nicely done!
just a few suggestions on comments
d253938
to
0ad5b09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cnderrauber,
great to see progress in this direction :)
A few points:
- This PR breaks the API of
UDPMux
. So we need to bump the major version of the module. Thats something I am open to, but I think we should batch this with other API-breaking changes (like #571). - Cant you simply use
AllConnsGetter
to get the count of connections? - I do not the the batched connection interface being used anywhere yet. Do you have or are planning to open a dedicated PR for this?
- It would be good if we can also add those new features to the standard
UDPMux
. Or is there a reasoning behind adding this only to theMultiUDPMux
?
Best regards,
Steffen
I think it can't be used. The
I can't get the point of this question, the batched connection interface is in the pion/transport repo, what do you mean a dedicated PR?
The UDPMux can have the batch feature simply by pass a PacketConn to its constructor |
Sorry. I dont see where the batched connection interface is used in pion/ice. I assume we need another PR to actually use it? Or did I missed something?
Alright. Thanks for the explanation. |
0ad5b09
to
f7b9660
Compare
Improve performance of UDPMux by BatchIO and load balance on ports
f7b9660
to
280be41
Compare
Merge master branch
Fix panic lint
9ca1867
to
6416274
Compare
}() | ||
|
||
// Skip IPv6 test on i386 | ||
const ptrSize = 32 << (^uintptr(0) >> 63) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should check against runtime.GOARCH
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are identical, and if one day there is a new GOARCH, the test will be failed.
Co-authored-by: Steffen Vogel <[email protected]>
Co-authored-by: Steffen Vogel <[email protected]>
Co-authored-by: Steffen Vogel <[email protected]>
Co-authored-by: Steffen Vogel <[email protected]>
Hi @cnderrauber, (This is a reply to #608 (comment), but I think its a more general concern which deserves its own comment.) I am sorry to raise objections about this PR. But I am afraid that this PR introduces even more complexity into the UDPMux logic. And given the current state, we already have three different types for UDP muxing and we have not yet achieved some of the initial goals to allow single-port muxing for relayed candidates. I was under the impression that the PR attempt to introduce a batched connection interface to I am not really that happy with hiding the packet batching/queuing inside the mux. I cleaner implementation would expose a batched connection interface to the user of pion/ice (in the form of Because of the implementation inside the UDP muxing logic, the port balancing and queuing is also limited to configurations which use the mux. How about configuration without a UDPMux? Or relayed candidates? |
That's true we already have three different types of UDPMux but actually they are all based on the UDPMuxDefault so they will all get benefits from the change. And I can't see why this will block other goals like
Port balancing and Batch are two standalone options to improve the mux throughput that user can choose to enable anyone of them or both. A Mux only listens on a single port without batch will have a poor performance that can't be used in a large-scale production environment.
So port balance and batch should limit to the mux. |
Don't be sorry! Appreciate you bringing up these points. It's worth discussing the tradeoffs that are being made. Let me share a bit more color here on the intent. The goal of the PR is to improve performance of Pion when writing out to clients. In benchmarking, we discovered the bottleneck to be syscalls made during the Benchmarks have shown that it's improved write performance by over 100%, compared to using standard pion/ice connections without batching. Also, by batching packets to multiple clients together, it helps to keep jitter under control, versus queuing sufficient packets before calling On the point of why it would use multiple ports. We've discovered additional bottlenecks writing out using a single UDP port. Locks within the write path would limit the number of cores we can effectively utilize. By allowing multiple ports to be used, we were able to get around some of these bottlenecks. Now, this is an option that isn't enabled by default. And for those that don't care about squeezing these bits of performance out of the system, the standard UDPMux is still available. I hear you on the complexity it introduces in the code, though I hope what's in this PR is a reasonable set of tradeoffs that doesn't make the interface anymore difficult to use. We are opening the PR here because we thought the work could benefit everyone using Pion. But we'd also respect your feedback if you feel that these changes should be kept in our own repo instead. |
Hi @davidzhao, Thanks for the detailed description of the motivation behind this PR. I fully agree with you on these points. Batching can really help here to increase the throughput. I would also love to profit from this feature in my own app. There is also a very good article by Tailscale where they describe how they have been able to accelerate the WireGuard Go implementation to achieve over 10Gbps throughput in Userspace by using batching: https://tailscale.com/blog/more-throughput/ I am only arguing about the API which we use to realize it. I would love to have a batching API also for Have you thought about abandoning the use of muxes? Using dedicated connections per ICE agent could help you with the load balancing as you are dealing with a separate per connection? Regarding the port balancing: Have you considered using |
Apologies for the intrusion but port balancing on the same address/port through |
We are also load balancing on a single port using Having batching as well would be nice, but I echo the earlier comments regarding the API. It would be nice if the batching wasn't coupled so tightly to the "multi udp mux" variant, but rather something that played nicely with the default UDP mux too. |
@kcaffrey @streamer45 any interest in adding port balancing into the standard UDPMux? It should be straight-forward to integrate with batching at that point. It seems that the primary objection to the PR is the implication that batching will only be available to UDPMuxMulti and nothing else. That's not the case here. Most of the work had been put into Am I misunderstanding the concerns? |
Hello! Just checking on the progress of this PR. We our building a stack (consisting of an SFU) based on Pion, and have encountered similar performance bottlenecks on the writeTo pathway and were actually considering forking and implementing batching ourselves before stumbling on this PR. We would really appreciate to know about the progress here as indeed we expect huge performance gains from this. Meanwhile we will port this code and use it as the udp muxer.And thanks for implementing this awesome PR |
@AshishKumar4 the PR was ready to merge months ago, but several folks took issues with certain design decisions. We could merge as is, but in trying to be good citizens, we are holding off until @stv0g or others give the 👍. Until then, I'm afraid this PR is stuck here. |
@davidzhao Apologies for the late reply. Yeah that should be doable and likely to be more performant than having to go through the
For what it's worth, as I am a mere bystander here, I am supportive of these changes. I was purely adding a data point above. Overall I think it's inherently challenging to achieve high performance without introducing some complexity somewhere and leaving it all on the higher layers to deal with (e.g. app side) is not always feasible. So I find giving the option to enable/disable these sort of enhancements at this level to be a very reasonable compromise. |
@cnderrauber @boks1971 do you guys still wanna merge this, it's been years, and I think the design issues can be addressed in later refactors or versions. |
I have no major objections to merging as is. My earlier note was primarily intended to provide information on how we are using the API currently (that is, passing in a socket we create ourselves so that we can perform load balancing over a single port). It looks like it would be possible to get batching (even without this PR) for any user who is passing a packet conn to UDPMux (rather than having one of the factory methods create and listen on the socket). I think it will be good to maintain this for any future improvements. For example, the batching strategy used by the implementation in pion/transport is not quite what we would want as it adds latency and jitter, so it is important to us that we can continue plugging in our own implementation of |
@kcaffrey Hello, good points, thank you. I'm also debating if this should be added to |
It makes sense to me to make it easy to enable batching from The current implementation in pion/transport batches on a fixed interval, which adds half the batch interval on average to the latency. Even worse (for many use cases), it will increase the apparent jitter by an amount proportional to the batch interval (assuming that packet write attempts arrive randomly within the interval), which will in turn cause client-side jitter buffers to grow to compensate. Back-pressure-triggered batching, where writes are queued but attempted immediately, and only batched when there are multiple queued writes available when starting a new syscall, would be a much safer default in my opinion. CPU savings will be lower or even negligible (especially if throughput is not bottlenecked), but overall throughput would be increased without adding any additional latency or jitter. If we were to have any on-by-default batching, I think it should be the opportunistic/"only batch when blocked" style described above. There are certainly applications where the fixed-interval-batching is a better tradeoff (where end to end latency isn't that important but overall CPU usage IS important), but I imagine that would not be true for a majority of use cases. |
yes, it will be great to merge this and using the settings engine to choose the behaviour. |
Will be good to have this in pion, it is useful in high throughput case that other pion users could get benefit from it. |
Performance improvements:
Add BatchIO option to NewMultiUDPMuxPort(s), so the UDPMux can use batch write to improve write throughput.
Add multiple ports support to MultiUDPMuxDefault, this option will listen to multiple ports on a single interface for traffic load balance.