sort: fix panic on non-UTF-8 filenames with --files0-from#11593
sort: fix panic on non-UTF-8 filenames with --files0-from#11593ppmpreetham wants to merge 5 commits intouutils:mainfrom
Conversation
|
Can you please add a test to |
|
GNU testsuite comparison: |
|
added a test for the same |
Merging this PR will not alter performance
Comparing Footnotes
|
|
please fix clippy |
|
GNU testsuite comparison: |
| let (at, mut ucmd) = at_and_ucmd!(); | ||
|
|
||
| // non-UTF-8 bytes (0xFF) | ||
| let filename = std::ffi::OsString::from_vec(vec![b'a', 0xFF, b'b']); |
There was a problem hiding this comment.
| let filename = std::ffi::OsString::from_vec(vec![b'a', 0xFF, b'b']); | |
| let filename = std::ffi::OsString::from_vec(b"a\xffb".into()); |
Something like this would be more readable
| use std::ops::Range; | ||
| #[cfg(unix)] | ||
| use std::os::unix::ffi::OsStrExt; | ||
| use std::os::unix::ffi::{OsStrExt}; |
| #[cfg(not(unix))] | ||
| let f_str = String::from_utf8_lossy(&line); | ||
| #[cfg(not(unix))] | ||
| let f = OsStr::new(f_str.as_ref()); |
There was a problem hiding this comment.
| #[cfg(not(unix))] | |
| let f_str = String::from_utf8_lossy(&line); | |
| #[cfg(not(unix))] | |
| let f = OsStr::new(f_str.as_ref()); | |
| #[cfg(not(unix))] | |
| let f = { | |
| let f_str = String::from_utf8_lossy(&line); | |
| OsStr::new(f_str.as_ref()) | |
| }; |
Avoid writing #[cfg(not(unix))] more than once
| if f == STDIN_FILE { | ||
| return Err(SortError::MinusInStdIn.into()); | ||
| } | ||
| if f.is_empty() { | ||
| return Err(SortError::ZeroLengthFileName { | ||
| file: files0_from, | ||
| line_num: line_num + 1, | ||
| } | ||
| _ => {} | ||
| .into()); |
There was a problem hiding this comment.
Not sure why you can't keep the match here
|
some jobs are failing btw :) |
Re-opening this to fix the hidden character warning in the previous branch name. Supersedes #11592.
Fixes #9696
Description
When parsing NUL-separated filenames via
--files0-from,sortpreviously enforced strict UTF-8 validation usingstd::str::from_utf8().expect(...). This caused an immediate panic when encountering valid non-UTF-8 paths. GNUsorttreats filenames as raw bytes and does not require them to be UTF-8 encoded.This PR aligns
uutilswith GNU behavior by removing the strict UTF-8 enforcement:OsStr::from_bytesto losslessly cast the raw bytes directly into anOsString, preserving arbitrary byte sequences exactly as GNU does.String::from_utf8_lossy. This prevents the program from panicking or failing on invalid UTF-8 sequences, ensuring graceful error handling across all platforms.Testing
Reproduced the issue and verified the fix locally (both linux and windows):