Skip to content

Conversation

@lotabout
Copy link

@lotabout lotabout commented Nov 2, 2025

This PR adds a new --utf8 option to enable UTF-8 character support.

When UTF-8 support is enabled, tcpdump will detect and display UTF-8 characters in the payload as-is when using the -A option.
Note that in -X mode, if a multi-byte character spans across two lines, it will appear on the first line, and a spaces will be padded on the next line.

Tests

  • Verified with several random PCAP files dumped using -A and -x, ensuring their MD5 checksums remain identical without --utf8.
  • Manually tested using a UTF-8 sample PCAP on macOS.

utf8.pcap.zip

@lotabout lotabout changed the title fix #1190 support to display packet content as UTF-8 [Draft] fix #1190 support to display packet content as UTF-8 Nov 2, 2025
@lotabout lotabout force-pushed the feat/support-utf8 branch 3 times, most recently from 8934b1e to 2ecc6b2 Compare November 2, 2025 16:12
@lotabout lotabout changed the title [Draft] fix #1190 support to display packet content as UTF-8 fix #1190 support to display packet content as UTF-8 Nov 2, 2025
@gvanem
Copy link
Contributor

gvanem commented Nov 4, 2025

I tried this on Windows using MSVC and clang-cl. But there is no wcwidth() here:

print-ascii.c(118,11): error: call to undeclared function 'wcwidth'; ISO C99 and later do not support implicit function declarations
      [-Wimplicit-function-declaration]
  118 |                 int w = wcwidth(wc);
      |                         ^

@infrastation
Copy link
Member

How did it build in Appveyor then?

@gvanem
Copy link
Contributor

gvanem commented Nov 4, 2025

How did it build in Appveyor then?

Since HAVE_WCHAR_T was not detected and used I presume. No diff for CMakeLists.txt AFAICS.

@lotabout
Copy link
Author

lotabout commented Nov 4, 2025

@gvanem Could you please help me give it another try on Windows? I’ve added Markus Kuhn’s implementation as a replacement on Windows.

@gvanem
Copy link
Contributor

gvanem commented Nov 4, 2025

@lotabout Tried it, but I see a lot of junk with with windump.exe --utf8 -Ar utf8.pcap. Like;
image

I assume your utf8.pcap file is based on this or this which displays better in my TCC shell:

image

@lotabout
Copy link
Author

lotabout commented Nov 5, 2025

@gvanem Please help to try again with latest code, it turns out that locale should be set correctly for mbrtowc to work properly.

(left: tcpdump, right: cat. Compiled on Windows 11 with VS 2022, Shell: PowerShell with $OutputEncoding = [System.Text.Encoding]::UTF8 set)
image

image

@lotabout lotabout closed this Nov 5, 2025
@infrastation
Copy link
Member

tcpdump CI is failing because of my recent changes in libpcap. Please wait until this is fixed.

@lotabout lotabout reopened this Nov 5, 2025
@lotabout
Copy link
Author

lotabout commented Nov 5, 2025

tcpdump CI is failing because of my recent changes in libpcap. Please wait until this is fixed.

@infrastation Got it, Thanks~

@gvanem
Copy link
Contributor

gvanem commented Nov 5, 2025

Work very well now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants