-
Notifications
You must be signed in to change notification settings - Fork 7.7k
net: ip: Fix the warning in the data path #93282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Instead of warning for every-packet, warn only once and let user debug the underlying cause. Fix zephyrproject-rtos#49845 partially. Signed-off-by: Chaitanya Tata <[email protected]>
|
@@ -262,8 +262,8 @@ static bool net_if_tx(struct net_if *iface, struct net_pkt *pkt) | |||
status = net_if_l2(iface)->send(iface, pkt); | |||
net_if_tx_unlock(iface); | |||
if (status < 0) { | |||
NET_WARN("iface %d pkt %p send failure status %d", | |||
net_if_get_by_iface(iface), pkt, status); | |||
NET_WARN_ONCE("iface %d pkt %p send failure status %d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the linked issue I can see how warning on every packet is a usability problem (as mentioned in the LOG_WRN_ONCE
PR), but I also don't think that only ever outputting a single warning is great either.
The two annoyances I see are:
- you have no idea whether its just a transient failure or whether all packets are failing
- If you only attach to the logs after the first occurance, you have no idea there is a problem at all
Couldn't the initial issue be resolved at the zperf level by handling packet send errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two annoyances I see are:
- you have no idea whether its just a transient failure or whether all packets are failing
- If you only attach to the logs after the first occurance, you have no idea there is a problem at all
I understand that a single print might not help, but do we really want to debug data path issues using prints? IMHO, we should be using statistics to convey the seriousness of the issue. If that is still not acceptable, then I propose we pull in another Linux feature printk_ratelimited
which I am still not keen (in favour of printk_once) this way at least we don't bombard and user can control the rate. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only speak for myself, but if a deployed device is not getting data through to the cloud, I'm much more likely to be looking at serial logs than to sit there polling a stats object (somehow?) and checking to see if an error counter is going up. Even if it is going up, it doesn't really provide any reasoning as to why its going up.
A rate limited output would be fine from my perspective, but is obviously more work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likely to be looking at serial logs than to sit there polling a stats object (somehow?) and checking to see if an error counter is going up.
Well, I almost always use those shell commands to look at drops :). Traffic running async + shell to keep dumping stats is my go to debug for data path issues than looking at a flood of prints.
A rate limited output would be fine from my perspective, but is obviously more work.
Yes, it's a proper feature that needs to be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put more succinctly, shouldn't the problem at the driver layer that causes the failures and the application layer that continuously keeps trying to send be fixed, rather than making the log output less useful?
Absolutely, the entire pipeline as you say is responsible as you say (and IIRC we had the same discussion about lacking stop/start data path in Zephyr), but the specific problem this PR addresses is that, bombarding with prints (Zperf pumping at 50M) doesn't help, esp. you loose any control over the shell, cannot even type in |
Can we do something like only printing a warning if at least 1 second has passed since the last warning? |
Yeah, the rate limiting discussion is in the above comment. |
Instead of warning for every-packet, warn only once and let user debug the underlying cause.
Fix #49845 partially.