Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(system_monitor): check UDP network errors #9538

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

iwatake2222
Copy link
Contributor

@iwatake2222 iwatake2222 commented Dec 2, 2024

Description

Why this PR is needed

  • UDP packet drops make Autoware unstable. Therefore, this is an important indicator for determining whether Autoware is functioning correctly
  • There are several causes of UDP packet drops, but the primary reason is drops in the send/receive buffer. This is particularly true in the case of the loopback network

What this PR changs

  • Monitor UDP recv buffer errors and UDP send buffer errors

Related links

None

How was this PR tested?

  • total UDP rcv buf errors and total UDP snd buf errors are monitored, and they are retrieved from /proc/net/snmp
  • When errors per unit time is greater than threshold (>=1), status becomes warning
  • The CPU usage of system_monitor doesn't change
    • Without this PR: 13.6% / 1600%
    • With this PR: 12.6% / 1600%
    • (the CPU usage of system_monitor in 16-core ECU)

image

How to cause UDP buf errors

Run the following scripts

import socket
import time

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

# To cause snd buf error, enable the following line and run the following command
# $> sudo tc qdisc add dev lo root netem delay 100ms
# $> sudo tc qdisc del dev lo root    # fix setting
sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 4)

target_ip = "127.0.0.1"
target_port = 12345

while True:
    sock.sendto(b"a" * 1024, (target_ip, target_port))
    time.sleep(0.00001)
import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1)
sock.bind(("0.0.0.0", 12345))

while True:
    try:
        data, addr = sock.recvfrom(1024)
    except Exception as e:
        print(f"Error: {e}")

Notes for reviewers

None.

Interface changes

Topic changes

Additions and removals

None

Modifications

A new member named net_monitor: UDP Buf Errors is added into /diagnostics topic

ROS Parameter Changes

Additions and removals

system/system_monitor/config/net_monitor.param.yaml

Version Parameter Name Type Default Value Description
New udp_buf_errors_check_duration int 1 UDP buf errors check duration
New udp_buf_errors_check_count int 1 Generates warning when count of UDP buf errors during udp_buf_errors_check_duration reaches a specified value or higher

Modifications

None

Effects on system behavior

None.

@github-actions github-actions bot added type:documentation Creating or refining documentation. (auto-assigned) component:system System design and integration. (auto-assigned) labels Dec 2, 2024
Copy link

github-actions bot commented Dec 2, 2024

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@TetsuKawa TetsuKawa added tag:run-build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) tag:require-cuda-build-and-test labels Dec 2, 2024
@iwatake2222 iwatake2222 marked this pull request as ready for review December 3, 2024 11:33
@iwatake2222
Copy link
Contributor Author

@ito-san @TetsuKawa
Could you review this PR?
Also, please re-add run-build-and-test-differential label to run worflows (some workflows are stuck for some reasons...)

Signed-off-by: takeshi.iwanari <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:system System design and integration. (auto-assigned) tag:run-build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) type:documentation Creating or refining documentation. (auto-assigned)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants