-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to send 10 MB messages through the network #543
Comments
Okay, the issue is related to the congestion control. We need to use block instead of drop, or the large message will never be sent to the other side. That means we should keep the reliability to reliable and history to keep_all.
However, in the navigation2 scenario, the map server uses keep_last with depth 1. |
I had the same issue. I think all reliable topics should be using the BLOCK congestion control. |
I understand using In DDS the Now, my understanding is that the current issue is with a "sporadic" publication (I guess the map is published only once), over a congested WiFi. The publisher QoS is set to Another idea we're working on is to allow the router to change/overwrite the QoS per topic, when routing outside the robot. |
As explained in FastDDS documentation on sending large data. The current configuration of It is recommended to fine-tune other QoS settings and parameters based on the specific use case for transmitting large data. For instance, real-time video streaming has different requirements compared to sending an HD map as a one-time transfer. |
I'm not certain rmw_zenoh follows the same behavior. If I understand the doc you linked, Fast-DDS "Reliable+KeepLast(1)" would still ensure the latest sample in the buffer is received by the subscribers, unless it is erased by a new sample. This sounds like the correct behavior. rmw_zenoh, on the other hand, will stop trying to send the sample if it cannot fit in in the queue before the drop timeout, even in the topic is set to Reliable + KeepLast(1). I understand the risks of a "big" topic blocking a reliable topic, but I'd argue it's more the "big" topic's fault for not using a dedicated queue and transport. Still, if you prefer to keep Reliable+KeepLast topics dropping, I'd like to consider making at least transient local topics blocking. I hope that makes sense.
That would be nice. For now though, I use TCP+UDP between my peers (within the same host) but for the routers (talking over Wi-Fi), I use TCP?prio=0-5, TCP?prio=6-7 and UDP, so a lossy Wi-Fi wouldn't congest all the reliable topics. Thank you for the quick replies! |
Actually, what would make most sense (although not in Zenoh's current architecture) would be for the subscriber to be able to specify the desired reliability. So your robot can work with a reliable/blocking point cloud but RViz would be happy with a best_effort one. In addition, this semantic already exists in ROS1 and 2. |
Hi @Hugal31!
That's correct. I meant that users could lose data if they keep updating the queue without setting the history QoS to KEEP_ALL. We're discussing how to ensure reliability when sending a single large piece of data at most once.
In fact, that's our previous design. But we decided to make them configured on the publisher side. To be clear, the current issue is how we properly map ROS 2 |
While running ping-pong test across a network, a large-size message can't be received.
If sending with a 1MB message, data might be missed. With a 10 MB message, we can never receive that.
We test with the package: https://github.com/ZettaScaleLabs/ros2-simple-performance in the topology
ros2 run simple_performance ping --ros-args -p warmup:=1.0 -p size:=10000000 -p samples:=10 -p rate:=1
ros2 run simple_performance pong
We can't receive data on pong size, but it works when size is smaller.
Here are some investigations on the pong side:
SubscriptionData
isn't triggered.RUST_LOG=z=trace
, Zenoh didn't show the message payload.z_ping
andz_pong
can work.The issue is originally reported here
The text was updated successfully, but these errors were encountered: