sort: fix panic on non-UTF-8 filenames with --files0-from by ppmpreetham · Pull Request #11593 · uutils/coreutils

ppmpreetham · 2026-04-02T07:20:48Z

Re-opening this to fix the hidden character warning in the previous branch name. Supersedes #11592.

Description

When parsing NUL-separated filenames via --files0-from, sort previously enforced strict UTF-8 validation using std::str::from_utf8().expect(...). This caused an immediate panic when encountering valid non-UTF-8 paths. GNU sort treats filenames as raw bytes and does not require them to be UTF-8 encoded.

This PR aligns uutils with GNU behavior by removing the strict UTF-8 enforcement:

On Unix: We now use OsStr::from_bytes to losslessly cast the raw bytes directly into an OsString, preserving arbitrary byte sequences exactly as GNU does.
On non-Unix (e.g., Windows): We safely fall back to String::from_utf8_lossy. This prevents the program from panicking or failing on invalid UTF-8 sequences, ensuring graceful error handling across all platforms.

Testing

Reproduced the issue and verified the fix locally (both linux and windows):

$ printf "20\n10\n" > "weird$(printf '\xff')name"
$ printf "weird$(printf '\xff')name\0" > list0
$ ./target/release/coreutils sort --files0-from=list0
10
20

This reverts commit 446fc9d.

cakebaker · 2026-04-02T07:23:57Z

Can you please add a test to tests/by_util/test_sort.rs to ensure we don't regress in the future? Thanks.

github-actions · 2026-04-02T07:33:50Z

GNU testsuite comparison:

GNU test failed: tests/tail/tail-n0f. tests/tail/tail-n0f is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/pr/bounded-memory (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)

ppmpreetham · 2026-04-02T07:39:15Z

added a test for the same

codspeed-hq · 2026-04-02T07:43:23Z

Merging this PR will not alter performance

✅ 305 untouched benchmarks
⏩ 46 skipped benchmarks¹

_{Comparing ppmpreetham:fix/9692-Panics-on-Non-UTF-8 (afcd762) with main (e510449)}

46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

oech3 · 2026-04-02T07:49:08Z

please fix clippy

  error: unused import: `OsStringExt`
    --> src/uu/sort/src/sort.rs:42:36
     |
  42 | use std::os::unix::ffi::{OsStrExt, OsStringExt};
     |                                    ^^^^^^^^^^^
     |
     = note: `-D unused-imports` implied by `-D warnings`
     = help: to override `-D warnings` add `#[allow(unused_imports)]`

github-actions · 2026-04-02T09:33:33Z

GNU testsuite comparison:

Skip an intermittent issue tests/pr/bounded-memory (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/tail/pipe-f is now passing!

RenjiSann · 2026-04-03T10:22:48Z

tests/by-util/test_sort.rs

+    let (at, mut ucmd) = at_and_ucmd!();
+
+    // non-UTF-8 bytes (0xFF)
+    let filename = std::ffi::OsString::from_vec(vec![b'a', 0xFF, b'b']);


Suggested change

let filename = std::ffi::OsString::from_vec(vec![b'a', 0xFF, b'b']);

let filename = std::ffi::OsString::from_vec(b"a\xffb".into());

Something like this would be more readable

RenjiSann · 2026-04-03T10:23:15Z

src/uu/sort/src/sort.rs

 use std::ops::Range;
 #[cfg(unix)]
-use std::os::unix::ffi::OsStrExt;
+use std::os::unix::ffi::{OsStrExt};


Remove the brackets

RenjiSann · 2026-04-03T10:25:02Z

src/uu/sort/src/sort.rs

+            #[cfg(not(unix))]
+            let f_str = String::from_utf8_lossy(&line);
+            #[cfg(not(unix))]
+            let f = OsStr::new(f_str.as_ref());


Suggested change

#[cfg(not(unix))]

let f_str = String::from_utf8_lossy(&line);

#[cfg(not(unix))]

let f = OsStr::new(f_str.as_ref());

#[cfg(not(unix))]

let f = {

let f_str = String::from_utf8_lossy(&line);

OsStr::new(f_str.as_ref())

};

Avoid writing #[cfg(not(unix))] more than once

RenjiSann · 2026-04-03T10:26:46Z

src/uu/sort/src/sort.rs

+            if f == STDIN_FILE {
+                return Err(SortError::MinusInStdIn.into());
+            }
+            if f.is_empty() {
+                return Err(SortError::ZeroLengthFileName {
+                    file: files0_from,
+                    line_num: line_num + 1,
                }
-                _ => {}
+                .into());


Not sure why you can't keep the match here

sylvestre · 2026-04-03T10:44:52Z

some jobs are failing btw :)

ppmpreetham added 3 commits April 2, 2026 12:32

fix(sort): non‑UTF‑8 files ( fixes uutils#9692 )

9c4e986

remove(sort): unused imports

446fc9d

Revert "remove(sort): unused imports"

0865da4

This reverts commit 446fc9d.

cakebaker changed the title ~~sort: fix panic on non-UTF-8 filenames with --files0-from ( Fixes #9696 )#11592~~ sort: fix panic on non-UTF-8 filenames with --files0-from Apr 2, 2026

test: non-utf8 file

cc495a3

fix: clippy

afcd762

RenjiSann reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sort: fix panic on non-UTF-8 filenames with --files0-from#11593

sort: fix panic on non-UTF-8 filenames with --files0-from#11593
ppmpreetham wants to merge 5 commits intouutils:mainfrom
ppmpreetham:fix/9692-Panics-on-Non-UTF-8

ppmpreetham commented Apr 2, 2026

Uh oh!

cakebaker commented Apr 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

ppmpreetham commented Apr 2, 2026

Uh oh!

codspeed-hq bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

oech3 commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

RenjiSann Apr 3, 2026

Uh oh!

RenjiSann Apr 3, 2026

Uh oh!

RenjiSann Apr 3, 2026

Uh oh!

RenjiSann Apr 3, 2026

Uh oh!

sylvestre commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	let filename = std::ffi::OsString::from_vec(vec![b'a', 0xFF, b'b']);
	let filename = std::ffi::OsString::from_vec(b"a\xffb".into());

Uh oh!

Conversation

ppmpreetham commented Apr 2, 2026

Description

Testing

Uh oh!

cakebaker commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

ppmpreetham commented Apr 2, 2026

Uh oh!

codspeed-hq bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

oech3 commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

RenjiSann Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

RenjiSann Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

RenjiSann Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

RenjiSann Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

sylvestre commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cakebaker commented Apr 2, 2026 •

edited

Loading

codspeed-hq bot commented Apr 2, 2026 •

edited

Loading