Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ps recognizes UTF8 as UTF16 #4697

Open
Semnodime opened this issue Nov 3, 2024 · 4 comments · May be fixed by #4874
Open

[Bug] ps recognizes UTF8 as UTF16 #4697

Semnodime opened this issue Nov 3, 2024 · 4 comments · May be fixed by #4874
Labels
bug Something isn't working RzUtil test-required
Milestone

Comments

@Semnodime
Copy link

Semnodime commented Nov 3, 2024

Work environment

rizin 0.8.0 @ linux-x86-64
commit: 73d85d2

Expected behavior

Detect and display string (hex f0 9f 9f aa f0 9f 9f aa 00, decoded 🟪🟪) as UTF8

Actual behavior

UTF16BE (which is incorrectly parsed as well, if it actually was UTF16 but that's a separate bug)

Steps to reproduce the behavior

ELF AMD64

[0x0007ed51]> pxc
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF  comment
0x0007ed51  f09f 9faa f09f 9faa 0025 6868 75ef b88f  .........%hhu...  ; data.0007ed51  ; str.hhu
[0x0007ed51]> psj
{"string":"\u00f0\u009f\u009f\u00aa\u00f0\u009f\u009f\u00aa%\u0068\u0068\u0075\u00ef\u00b8\u008f\u00e2\u0083\u00a3\u0000\u0059\u006f\u0075\u0020\u0073\u006f\u006c\u0076\u0065\u0064\u0020\u0074\u0068\u0065\u0020\u0063\u0068\u0061\u006c\u006c\u0065\u006e\u0067\u0065\u0021\u0000\u002f\u0062\u0069\u006e\u002f\u0073\u0068\u0000Y\u006f\u0075\u0020\u006d\u0061\u0079\u0020\u0068\u0061\u0076\u0065\u0020\u0073\u006f\u006c\u0076\u0065\u0064\u0020\u0074\u0068\u0065\u0020\u0070\u0075\u007a\u007a\u006c\u0065\u0020\u0062\u0075\u0074\u0020\u0079\u006f\u0075\u0020\u0064\u0069\u0064\u0020\u006e\u006f\u0074\u0020\u0073\u006f\u006c\u0076\u0065\u0020\u0074\u0068\u0065\u0020\u0063\u0068\u0061\u006c\u006c\u0065\u006e\u0067\u0065\u0020\u003b\u0029","offset":519505,"section":".rodata","length":122,"type":"utf16be"}
[0x0007ed51]> ps+j
{"string":"\u009f\u009f\u00aa\u00f0\u009f\u009f\u00aa\u0000\u0025\u0068\u0068\u0075\u00ef\u00b8\u008f\u00e2\u0083\u00a3Y\u006f\u0075\u0020\u0073\u006f\u006c\u0076\u0065\u0064\u0020\u0074\u0068\u0065\u0020\u0063\u0068\u0061\u006c\u006c\u0065\u006e\u0067\u0065\u0021/\u0062\u0069\u006e\u002f\u0073\u0068","offset":519505,"section":".rodata","length":50,"type":"utf16be"}
[0x0007ed51]> ps
龪龪%桨痯뢏ꌀ奯甠獯汶敤⁴桥\xe2\x81\xa3桡汬敮来℀⽢楮⽳栀Y潵\xe2\x81\xad慹\xe2\x81\xa8慶攠獯汶敤⁴桥⁰畺穬攠扵琠祯甠摩搠湯琠獯汶攠瑨攠捨慬汥湧攠㬩
[0x0007ed51]> ps+
龟꫰龟ꨀ╨桵迢莣Y潵\xe2\x81\xb3潬癥搠瑨攠捨慬汥湧攡/扩港獨
@wargio
Copy link
Member

wargio commented Nov 3, 2024

I believe is due the guess encoding. you can enforce utf-8 by setting str.search.encoding=utf8

[0x00000000]> e str.search.encoding
guess
[0x00000000]> e str.search.encoding=?
ascii
8bit
utf8
utf16le
utf32le
utf16be
utf32be
guess

@wargio
Copy link
Member

wargio commented Nov 3, 2024

Also since those chars are emoji, i am strongly sure we do not handle it correctly when guessing.

@notxvilka notxvilka modified the milestones: 0.8.0, 0.9.0 Jan 20, 2025
@Rot127 Rot127 added the bug Something isn't working label Jan 24, 2025
@Rot127
Copy link
Member

Rot127 commented Jan 24, 2025

The string detection metrics are a little off in general.
The functions in str_search.c have some not further described metrics and some of them are definitely not correct (although probably work for the context they are in).
And rz_str_guess_encoding_from_buffer doesn't check for ibm037 and non-Unicode encodings and have the problem mentioned above.

@Rot127
Copy link
Member

Rot127 commented Jan 27, 2025

To add to this. The problem of string encoding detection is also a nice fit for the knowledge base. Since different encodings have overlapping characters. Being able to seamlessly switch, define the expected encoding once or detect the expected encoding from according to some statistics, would be nice to have. But this in itself is a single module on top of the knowledge base I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working RzUtil test-required
Projects
None yet
5 participants