Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locale(encoding=) should not strictly rely on iconvlist() #1537

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bastistician
Copy link

The check_encoding() auxiliary function verifies the specified input encoding via iconvlist()

if (tolower(x) %in% tolower(iconvlist())) {

and rejects any non-listed encoding.

Unfortunately, this unnecessarily breaks locale() on platforms where iconvlist() is incomplete. As ?iconvlist says

On most platforms 'iconvlist' provides an alphabetical list of the
supported encodings. On others, the information is on the man
page for 'iconv(5)' or elsewhere in the man pages

I'd suggest to always accept the portable encodings "latin1" and "UTF-8" and otherwise only give a warning rather than stop if the specified encoding is not found.

This PR also fixes long-standing package check failures on Alpine Linux, e.g.,

--- re-building ‘locales.Rmd’ using rmarkdown
Quitting from lines  at lines 141-166 [unnamed-chunk-12] (locales.Rmd)
Error: processing vignette 'locales.Rmd' failed with diagnostics:
Unknown encoding latin1
--- failed re-building ‘locales.Rmd’

@bastistician
Copy link
Author

This bug breaks several reverse dependencies on Alpine Linux (similarly for vroom:::check_encoding()). I cannot provide a full list, as I have long used patched versions of readr and vroom for that reason. Any chance this could be merged?

To give just one example: breathtestcore::read_iris_csv() wants to read a file with readr::locale(encoding = "ISO-8859-2"). musl's iconv is perfectly capable of converting this encoding, but readr refuses to procede just because that encoding string is not (exactly) listed in iconv -l:

UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF32-LE, UCS-2BE, UCS-2LE, WCHAR_T,
US_ASCII, ISO8859-1, ISO8859-2, ISO8859-3, ISO8859-4, ISO8859-5,
ISO8859-6, ISO8859-7, ...

Note that musl ignores all dashes and capitalization in the encoding name, so it supports "ISO8859-2" as well as "iso-8859-2" and even "iso-88592". (For comparison, glibc's iconv lists all supported notations in iconv -l and doesn't support the last one.) Furthermore, the name "latin1" is not mentioned explicitly in the above output but still supported (note the "...", so the patch unconditionally includes "latin1").

Alternatively to this PR, one could completely skip the flaky iconvlist()-based encoding check on musl-based systems, e.g., if grepl("musl", R.version$os, fixed = TRUE), but as ?iconvlist says, there are (also other) platforms where iconvlist() may be incomplete (I suspect, e.g., uclibc-ng, but didn't test that specifically). So I think degrading the error to a warning is the best compromise.

@jennybc
Copy link
Member

jennybc commented Feb 10, 2025

Thanks @bastistician I will take a look this week, as I'm doing some other R package maintenance tasks.

@bastistician
Copy link
Author

Thanks. While iconvlist() is "only" incomplete on musl, the call might even fail on other platforms with "Error: 'iconvlist' is not available on this system" (but this does not imply that iconv would fail), so it should probably even be wrapped in a tryCatch(). I could update the PR, but maybe you would like to handle this differently anyway, so I'm just leaving this note.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants