-
Notifications
You must be signed in to change notification settings - Fork 39
Packet Drops Monitoring
- Introduction
- Monitoring Software Originated Drops
- Monitoring Hardware Originated Drops
- Further Resources
The Linux kernel's data path can be quite complex and may involve interactions between multiple different modules. Consequently, packets traversing this data path can be dropped for a variety of reasons. Being able to monitor these packet drops and understand where and why they occurred is invaluable when trying to root cause a problem.
When the data path is offloaded using one of the ASICs from the Spectrum family, packets are not dropped by the kernel, but rather by the ASIC itself.
This page explains how to monitor packet drops that occur in either the software or hardware data path.
Kernel Version | |
---|---|
5.4 | Layer 2 drops |
5.5 | Layer 3 drops and exceptions |
5.6 | Tunnel drops and exceptions |
Note: devlink-trap
support is available in iproute2
version 5.4
and onwards. Please refer to these sections for instructions
on how to compile the kernel and iproute2
from source.
Packets (also called socket buffers) are dropped by the Linux kernel by
invoking the kfree_skb()
function. This is in contrast to packets
that are freed as part of normal operation by the consume_skb()
function.
The drop_monitor
kernel module can be used to trace the
kfree_skb()
function and send netlink notifications to user space
about dropped packets. The module is available in kernels compiled with
NET_DROP_MONITOR
enabled.
The DropWatch user space utility can be used to interact with the
drop_monitor
kernel module over netlink. Historically, the kernel
would only send periodic notifications (also called alerts) to user
space about dropped packets. These notifications include the call site
from which kfree_skb()
was invoked and the number of invocations in
the last interval.
In kernel 5.4 another mode of operation was added, in which the kernel sends the dropped packets themselves to user space along with relevant metadata.
The two modes of operation are described below.
Note: The DropWatch changes that add the second mode of operation have yet to be included in an official release. Therefore, for the time being, it is recommended to install DropWatch directly from source.
To monitor packet drops in this mode using DropWatch, run:
$ dropwatch -l kas
Initalizing kallsyms db
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
1 drops at ip6_mc_input+235 (0xffffffff8286d295) [software]
1 drops at br_stp_rcv+ff (0xffffffff828c2d4f) [software]
To monitor packet drops in this mode using DropWatch, run:
dropwatch> set alertmode packet
Setting alert mode
Alert mode successfully set
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
drop at: ip6_mc_input+0x235/0x2a0 (0xffffffff8286d295)
origin: software
input port ifindex: 4
timestamp: Sun Aug 25 18:55:04 2019 299272815 nsec
protocol: 0x86dd
length: 167
original length: 167
drop at: ip6_mc_input+0x235/0x2a0 (0xffffffff8286d295)
origin: software
input port ifindex: 4
timestamp: Sun Aug 25 18:55:05 2019 501599210 nsec
protocol: 0x86dd
length: 114
original length: 114
It is possible to have the kernel truncate the dropped packets to a specified length before notifying them to user space. For example, in case only the first 96 bytes are of interest, run:
dropwatch> set trunc 96
Setting truncation length to 96
Truncation length successfully set
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
drop at: br_stp_rcv+0xff/0x2eb (0xffffffff828c2d4f)
origin: software
input port ifindex: 4
timestamp: Sun Aug 25 18:55:55 2019 956117078 nsec
protocol: 0x4
length: 96
original length: 119
In order to avoid expensive operations in the context in which packets
are dropped, the drop_monitor
kernel module clones the dropped packets
and queues them on a per-CPU list which is later processed in process
context. By default, the length of this queue is bound at 1,000 packets
in order to avoid exhausting the system's memory. To monitor the number
of packets that were dropped due to this limit, run:
dropwatch> stats
Getting statistics
Software statistics:
Tail dropped: 0
Hardware statistics:
Tail dropped: 0
To change the limit, run:
dropwatch> set queue 100
Setting queue length to 100
Queue length successfully set
The current configuration can be queried from the kernel by the following command:
dropwatch> show
Getting existing configuration
Alert mode: Packet
Truncation length: 96
Queue length: 100
Each dropped packet is encapsulated in a netlink packet that also
encodes various metadata about the dropped packet such as drop location
and timestamp. It is possible to dissect these netlink packets using
Wireshark or its terminal equivalent, tshark
.
Packets can be captured using the nlmon
netdev while DropWatch is
running:
$ ip link add name nlmon0 type nlmon
$ ip link set dev nlmon0 up
$ dumpcap -i nlmon0 -p -P -w netlink-generic-net_dm.pcap
The packets can be later imported into Wireshark. Alternatively, it is
possible to display and filter the currently dropped packets using
tshark
. For example, to filter dropped IPv6 packets with UDP port 547,
run:
$ tshark -i nlmon0 -Y 'eth.type==0x86dd && udp.dstport==547' -O net_dm
It is also possible to filter on specific fields in the encapsulating netlink packet. For example, to filter dropped packets received from a particular netdev, run:
$ tshark -i nlmon0 -Y 'net_dm.port.netdev_index==5' -O net_dm
To filter packets that were dropped in the IPv6 stack, run:
$ tshark -i nlmon0 -Y 'net_dm.symbol contains "ip6"' -O net_dm
To list the fields exposed by the drop monitor dissector, run:
$ tshark -G fields | grep net_dm
P Linux net_dm (network drop monitor) protocol net_dm
...
F Timestamp net_dm.timestamp FT_ABSOLUTE_TIME net_dm 0x0
F Protocol net_dm.proto FT_UINT16 net_dm BASE_HEX 0x0
F Truncation length net_dm.trunc_len FT_UINT32 net_dm BASE_DEC 0x0
F Original length net_dm.orig_len FT_UINT32 net_dm BASE_DEC 0x0
F Queue length net_dm.queue_len FT_UINT32 net_dm BASE_DEC 0x0
F Attribute type net_dm.stats.attr_type FT_UINT16 net_dm BASE_DEC 0x3fff
F Packet origin net_dm.origin FT_UINT16 net_dm BASE_DEC 0x0
F Hardware trap group name net_dm.hw_trap_group_name FT_STRINGZ net_dm 0x0
F Hardware trap name net_dm.hw_trap_name FT_STRINGZ net_dm 0x0
F Hardware trap count net_dm.hw_trap_count FT_UINT32 net_dm BASE_DEC 0x0
...
Note: To understand if your Wireshark version includes the
dissector, check the output of tshark -G protocols | grep net_dm
. In
case the dissector is included, the output should be: Linux net_dm (network drop monitor) protocol net_dm net_dm
. To install Wireshark
from source, please refer to the Wireshark documentation.
Note: In order for Wireshark to correctly invoke the drop monitor
dissector, it must first learn about the mapping between the protocol's
name (i.e., NET_DM
) and its dynamically allocated ID. As explained
here, this mapping is discovered from parsing a particular netlink
packet that can be triggered by invoking genl-ctrl-list
.
Alternatively, DropWatch can be invoked after starting the packet
capture, as during its initialization it will resolve this mapping.
When the data path is offloaded, packets are both forwarded and dropped by the ASIC. This means that the kernel has no visibility into these packet drops, which makes it difficult to debug various problems.
Using devlink-trap
it is possible to instruct the ASIC to pass (trap)
dropped packets to the CPU. To list the various drop reasons that can be
reported, run:
$ devlink trap show
pci/0000:01:00.0:
name source_mac_is_multicast type drop generic true action drop group l2_drops
name vlan_tag_mismatch type drop generic true action drop group l2_drops
name ingress_vlan_filter type drop generic true action drop group l2_drops
name ingress_spanning_tree_filter type drop generic true action drop group l2_drops
name port_list_is_empty type drop generic true action drop group l2_drops
name port_loopback_filter type drop generic true action drop group l2_drops
Please refer to the kernel documentation for explanation about the
various drop reasons. By default, dropped packets are not trapped and
therefore their action is reported as drop
. To instruct the device to
trap packets that are dropped due to ingress VLAN filter, change its
action to trap
:
$ devlink trap set pci/0000:01:00.0 trap ingress_vlan_filter action trap
When dropped packets are trapped, they are not injected to the kernel's receive path, but instead passed to devlink, which performs packets and bytes accounting. These statistics can be queried from the kernel using the following command:
$ devlink -s trap show pci/0000:01:00.0 trap ingress_vlan_filter
pci/0000:01:00.0:
name ingress_vlan_filter type drop generic true action trap group l2_drops
stats:
rx:
bytes 541536 packets 5641
In turn, devlink passes the dropped packet to the drop_monitor
kernel
module, which will report the drop to user space in case monitoring of
hardware drops is enabled. This is explained in the next section.
By default drop_monitor
only monitors software drops. In case it is
desired to have drop_monitor
monitor both software and hardware drops,
run:
dropwatch> set sw true
setting software drops monitoring to 1
dropwatch> set hw true
setting hardware drops monitoring to 1
In case only hardware originated drops are of interest, run:
dropwatch> set hw true
setting hardware drops monitoring to 1
dropwatch> set sw false
setting software drops monitoring to 0
The rest of the usage is identical to what was already described for software originated drops. However, unlike software originated drops, for hardware originated drops, the drop reason is reported as a string:
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
drop at: ingress_vlan_filter (l2_drops)
origin: hardware
input port ifindex: 9
input port name: swp3
timestamp: Mon Aug 26 20:15:33 2019 461798287 nsec
protocol: 0x8100
length: 110
original length: 110
Prometheus is a popular time series database used for event monitoring and alerting. Its main component is the Prometheus server which periodically scrapes and stores time series data. The data is scraped from various exporters that export their metrics over HTTP.
Two different Prometheus exporters are described below.
Using devlink-exporter
it is possible to export packets and
bytes statistics about each trap to Prometheus.
To run the exporter on the switch you wish to monitor, run:
$ ./devlink-exporter.py -l 0.0.0.0:9417
Alternatively, use a systemd
service unit file:
# /etc/systemd/system/devlink-exporter.service
[Unit]
Description=devlink exporter
Documentation=man:devlink(8)
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/devlink-exporter.py -l 0.0.0.0:9417
[Install]
WantedBy=multi-user.target
To start the service, run:
$ systemctl start devlink-exporter
To make the configuration persistent, run:
$ systemctl enable devlink-exporter.service
When scraped by the Prometheus server, the exporter will query the statistics from the kernel and pass them over HTTP to the server.
Grafana can then be used to visualize the information:
While devlink-exporter
can export packets and bytes statistics,
sometimes more fine-grained statistics are required. For example, in
case per-{flow, trap} statistics are required, it is possible to use
ebpf_exporter
as described here.
The exporter will install an eBPF program in the kernel which will efficiently provide per-{flow, trap} statistics to user space via an eBPF map. Please refer to this section for more details.
As with devlink-exporter
, it is possible to use Grafana to visualize
the information:
- man dropwatch
- man dumpcap
- man wireshark
- man tshark
- man wireshark-filter
- man devlink-trap
Installation
System Maintenance
Network Interface Configuration
- Switch Port Configuration
- Netdevice Statistics
- Persistent Configuration
- Quality of Service
- Queues Management
- How To Configure Lossless RoCE
- Port Mirroring
- ACLs
- OVS
- Resource Management
- Precision Time Protocol (PTP)
Layer 2
Network Virtualization
Layer 3
- Static Routing
- Virtual Routing and Forwarding (VRF)
- Tunneling
- Multicast Routing
- Virtual Router Redundancy Protocol (VRRP)
Debugging