Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exits for no reason by itself when listening on lo or any. #127

Open
xtaran opened this issue Dec 13, 2023 · 4 comments
Open

Exits for no reason by itself when listening on lo or any. #127

xtaran opened this issue Dec 13, 2023 · 4 comments

Comments

@xtaran
Copy link

xtaran commented Dec 13, 2023

Hi,

while trying out sniffglue (at version 0.15.0 on Debian GNU/Linux Unstable, package version 0.15.0-7), I noticed that, when using either the interface lo (local loopback device) or the virtual interface any (i.e. sniff on on all interfaces), it outputs a bunch of packets and then exits (even with exit code 0, so not a crash?) for no obvious reason reproducibly after a seemingly random number of packets (so far I've counted 28, 49, 106 and 116 using sniffglue lo | wc -l).

sniffglue

So far, when I used it on any, it also only showed packets from the lo interface before it exited. But that might have been just chance.

@kpcyrd
Copy link
Owner

kpcyrd commented Dec 13, 2023

I'm having trouble reproducing this, this works fine for me:

doas podman run -it --rm --net=host --privileged debian:sid sh -c 'apt update && apt dist-upgrade -y && apt install -y sniffglue && sniffglue lo -vv'

if sniffglue exits this usually means all worker threads have terminated, for example because the handle to the network device got closed, but possibly also because they got killed by seccomp (this usually should give you at least some output though).

To verify this, check your system logs, try strace -f sniffglue lo and check for thread terminating due to signals, or try running sniffglue --insecure-disable-seccomp lo to check if this happens unrelated to seccomp.

If it is related to seccomp, you need to run strace -f sniffglue lo 2> strace.log and search for = ? from the bottom-up to identify any syscalls that have been interrupted.

@xtaran xtaran changed the title Exits for now reason by itself when listening on lo or any. Exits for no reason by itself when listening on lo or any. Dec 13, 2023
@xtaran
Copy link
Author

xtaran commented Dec 13, 2023

Thanks for trying to reproduce and for these suggestions!

I think the strace found something:

[…]
[pid 24096] write(1, "\33[33m00:00:00:00:00:00 -> 00:00:"..., 14500:00:00:00:00:00 -> 00:00:00:00:00:00, [udp   ] 127.0.0.1:56349        -> 127.0.0.1:53           [dns] req, (AAAA, "reform.n[…]")
 <unfinished ...>
[pid 24102] <... rt_sigprocmask resumed>NULL, 8) = 0
[pid 24096] <... write resumed>)        = 145
[pid 24102] madvise(0x7f19da3fd000, 2076672, MADV_DONTNEED <unfinished ...>
[pid 24096] write(1, "\33[33m00:00:00:00:00:00 -> 00:00:"..., 12200:00:00:00:00:00 -> 00:00:00:00:00:00, [udp   ] 127.0.0.1:53           -> 127.0.0.1:56349        [dns] resp, []
 <unfinished ...>
[pid 24102] <... madvise resumed>)      = 0
[pid 24096] <... write resumed>)        = 122
[pid 24102] exit(0 <unfinished ...>
[pid 24096] setsockopt(4, SOL_PACKET, PACKET_RX_RING, {tp_block_size=0, tp_block_nr=0, tp_frame_size=0, tp_frame_nr=0}, 16 <unfinished ...>
[pid 24102] <... exit resumed>)         = ?
[pid 24096] <... setsockopt resumed>)   = -1 EBUSY (Device or resource busy)
[pid 24102] +++ exited with 0 +++
munmap(0x7f19db05a000, 4194304)         = 0
munmap(0x7f19db8bf000, 266240)          = 0
close(3)                                = 0
close(4)                                = 0
sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=8192}, NULL) = 0
munmap(0x7f19db900000, 12288)           = 0
exit_group(0)                           = ?
+++ exited with 0 +++

I suspect that this EBUSY might be what triggered sniffglue to exit.

@kpcyrd
Copy link
Owner

kpcyrd commented Dec 13, 2023

I didn't have time to debug this in depth yet, but according to man setsockopt Linux does not document the EBUSY error code for this syscall:

ERRORS
       The setsockopt() function shall fail if:

       EBADF  The socket argument is not a valid file descriptor.

       EDOM   The  send and receive timeout values are too big to fit into the timeout fields in the socket struc‐
              ture.

       EINVAL The specified option is invalid at the specified socket level or the socket has been shut down.

       EISCONN
              The socket is already connected, and a specified option cannot be set while the socket is connected.

       ENOPROTOOPT
              The option is not supported by the protocol.

       ENOTSOCK
              The socket argument does not refer to a socket.

       The setsockopt() function may fail if:

       ENOMEM There was insufficient memory available for the operation to complete.

       ENOBUFS
              Insufficient resources are available in the system to complete the call.

       The following sections are informative.

If you feel like debugging this, you'd need to figure out which undocumented error-case you're reaching in the Linux kernel.

@kpcyrd
Copy link
Owner

kpcyrd commented Sep 6, 2024

I managed to reproduce this too by building from source on Arch Linux (through a dhcp packet, but also udp):

[pid 3276579] poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}])
[pid 3276579] poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}])
[pid 3276579] poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}])
[pid 3276579] poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}])
[pid 3276579] poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}])
[pid 3276579] poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}])
[pid 3276579] poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}])
[pid 3276579] poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}])
[pid 3276579] poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}])
[pid 3276579] futex(0x8f0d9ad79f0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 3276572] <... futex resumed>)      = 0
[pid 3276579] sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=8192},  <unfinished ...>
[pid 3276572] setsockopt(3, SOL_PACKET, PACKET_RX_RING, {tp_block_size=0, tp_block_nr=0, tp_frame_size=0, tp_frame_nr=0}, 16 <unfinished ...>
[pid 3276579] <... sigaltstack resumed>NULL) = 0
[pid 3276572] <... setsockopt resumed>) = -1 EBUSY (Device or resource busy)
[pid 3276579] munmap(0x670be96e6000, 12288 <unfinished ...>
[pid 3276572] munmap(0x670be8d90000, 4194304 <unfinished ...>
[pid 3276579] <... munmap resumed>)     = 0
[pid 3276579] rt_sigprocmask(SIG_BLOCK, ~[RT_1], NULL, 8) = 0
[pid 3276579] madvise(0x670be3400000, 2076672, MADV_DONTNEED) = 0
[pid 3276579] exit(0)                   = ?
[pid 3276579] +++ exited with 0 +++
<... munmap resumed>)                   = 0
munmap(0x670be9190000, 266240)          = 0
close(4)                                = 0
close(3)                                = 0
sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=8192}, NULL) = 0
munmap(0x670be96fb000, 12288)           = 0
exit_group(0)                           = ?
+++ exited with 0 +++

I was suspecting this is because two threads are trying to read a packet at the same time, but this shouldn't be possible because of this mutex:

loop {
    let packet = {
        let mut cap = cap.lock().unwrap();
        cap.next_pkt()
    };
    // ...

next_pkt is implemented through pcap_sys::pcap_next_ex(self.handle, header.as_mut_ptr(), packet.as_mut_ptr()), so this is a fairly direct call into libpcap.

Curious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants