-
Notifications
You must be signed in to change notification settings - Fork 601
Description
When running the tests (make test) on Ubuntu 25.10, the following error is reported:
t/op/magic .......................................................
thread 'main' panicked at library/std/src/env.rs:163:83:
called `Result::unwrap()` on an `Err` value: "eh zero \xA0"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
# Failed test 186 - ENV store downgrades utf8 in setenv at op/magic.t line 87
# got ""
# expected "eh zero "
thread 'main' panicked at library/std/src/env.rs:163:83:
called `Result::unwrap()` on an `Err` value: "eh zero \xA0"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
# Failed test 187 - ENV store downgrades utf8 key in setenv at op/magic.t line 87
# got ""
# expected "widekey"
FAILED at test 186
The two tests in magic.t that are failing use the symbol "\xA0" (non-breaking space in Latin-1 ISO 8859-1 Latin-1 encoding) in an environment variable name and value.
The environment variable and its value are passed to the env binary of the host system, which is part of coreutils.
In the case of Rust coreutils, which is used in Ubuntu 25.10, environment variable names and values have to be unicode encoded, or a Rust error is triggered.
The symbol "\xA0" is not a valid UTF-8. The corresponding symbol encoded with UTF-8 would be \xC2\xA0
(see also https://en.wikipedia.org/wiki/UTF-8#Description).
I reported this also upstream to Ubuntu as a Rust coreutils issue: https://bugs.launchpad.net/ubuntu/+source/rust-coreutils/+bug/2132941
However, I did not mention the connection to Perl's magic.t test (I was aware of this when I created the bug).
A simple (tested) fix is to replace "eh zero \x{A0}" with "eh zero \x{c2}\x{a0}" in magic.t.
However, I'm not sure that's what Perl developers want.
My current level of understanding of this issue is as follows:
- There is no specified behaviour for non-Unicode encoded byte sequences passed as environment variables (names or values)
- GNU coreutils does not care if the byte sequence is Unicode encoded or not (treats it as a byte sequence)
- Bash triggers an error when seeing identifiers that are not Unicode encoded (see example in Ubuntu issue)
- Rust coreutils triggers an error when seeing a byte sequence that is not Unicode encoded
- Perl 5 includes a test case (magic.t) that depends on the GNU coreutils behavior and fails with Rust coreutils