Skip to content

sort: fix panic on non-UTF-8 filenames with --files0-from#11593

Open
ppmpreetham wants to merge 5 commits intouutils:mainfrom
ppmpreetham:fix/9692-Panics-on-Non-UTF-8
Open

sort: fix panic on non-UTF-8 filenames with --files0-from#11593
ppmpreetham wants to merge 5 commits intouutils:mainfrom
ppmpreetham:fix/9692-Panics-on-Non-UTF-8

Conversation

@ppmpreetham
Copy link
Copy Markdown

Re-opening this to fix the hidden character warning in the previous branch name. Supersedes #11592.

Fixes #9696

Description

When parsing NUL-separated filenames via --files0-from, sort previously enforced strict UTF-8 validation using std::str::from_utf8().expect(...). This caused an immediate panic when encountering valid non-UTF-8 paths. GNU sort treats filenames as raw bytes and does not require them to be UTF-8 encoded.

This PR aligns uutils with GNU behavior by removing the strict UTF-8 enforcement:

  • On Unix: We now use OsStr::from_bytes to losslessly cast the raw bytes directly into an OsString, preserving arbitrary byte sequences exactly as GNU does.
  • On non-Unix (e.g., Windows): We safely fall back to String::from_utf8_lossy. This prevents the program from panicking or failing on invalid UTF-8 sequences, ensuring graceful error handling across all platforms.

Testing

Reproduced the issue and verified the fix locally (both linux and windows):

$ printf "20\n10\n" > "weird$(printf '\xff')name"
$ printf "weird$(printf '\xff')name\0" > list0
$ ./target/release/coreutils sort --files0-from=list0
10
20

@cakebaker cakebaker changed the title sort: fix panic on non-UTF-8 filenames with --files0-from ( Fixes #9696 )#11592 sort: fix panic on non-UTF-8 filenames with --files0-from Apr 2, 2026
@cakebaker
Copy link
Copy Markdown
Contributor

cakebaker commented Apr 2, 2026

Can you please add a test to tests/by_util/test_sort.rs to ensure we don't regress in the future? Thanks.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

GNU testsuite comparison:

GNU test failed: tests/tail/tail-n0f. tests/tail/tail-n0f is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)

@ppmpreetham
Copy link
Copy Markdown
Author

added a test for the same

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 2, 2026

Merging this PR will not alter performance

✅ 305 untouched benchmarks
⏩ 46 skipped benchmarks1


Comparing ppmpreetham:fix/9692-Panics-on-Non-UTF-8 (afcd762) with main (e510449)

Open in CodSpeed

Footnotes

  1. 46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@oech3
Copy link
Copy Markdown
Contributor

oech3 commented Apr 2, 2026

please fix clippy

  error: unused import: `OsStringExt`
    --> src/uu/sort/src/sort.rs:42:36
     |
  42 | use std::os::unix::ffi::{OsStrExt, OsStringExt};
     |                                    ^^^^^^^^^^^
     |
     = note: `-D unused-imports` implied by `-D warnings`
     = help: to override `-D warnings` add `#[allow(unused_imports)]`

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

GNU testsuite comparison:

Skip an intermittent issue tests/pr/bounded-memory (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/tail/pipe-f is now passing!

let (at, mut ucmd) = at_and_ucmd!();

// non-UTF-8 bytes (0xFF)
let filename = std::ffi::OsString::from_vec(vec![b'a', 0xFF, b'b']);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let filename = std::ffi::OsString::from_vec(vec![b'a', 0xFF, b'b']);
let filename = std::ffi::OsString::from_vec(b"a\xffb".into());

Something like this would be more readable

use std::ops::Range;
#[cfg(unix)]
use std::os::unix::ffi::OsStrExt;
use std::os::unix::ffi::{OsStrExt};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the brackets

Comment on lines +2047 to +2050
#[cfg(not(unix))]
let f_str = String::from_utf8_lossy(&line);
#[cfg(not(unix))]
let f = OsStr::new(f_str.as_ref());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(not(unix))]
let f_str = String::from_utf8_lossy(&line);
#[cfg(not(unix))]
let f = OsStr::new(f_str.as_ref());
#[cfg(not(unix))]
let f = {
let f_str = String::from_utf8_lossy(&line);
OsStr::new(f_str.as_ref())
};

Avoid writing #[cfg(not(unix))] more than once

Comment on lines +2052 to +2060
if f == STDIN_FILE {
return Err(SortError::MinusInStdIn.into());
}
if f.is_empty() {
return Err(SortError::ZeroLengthFileName {
file: files0_from,
line_num: line_num + 1,
}
_ => {}
.into());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you can't keep the match here

@sylvestre
Copy link
Copy Markdown
Contributor

some jobs are failing btw :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sort --files0-from Requires UTF-8 / Panics on Non‑UTF‑8 Filenames

5 participants