-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent treatment of unicode character in RStudio vs quarto #492
Comments
Can you please share the Having different language settings can change the way that characters are rendered and converted. We have worked hard to make it consistent, but it seems another case has snuck through. Sometimes, other libraries will do some degree of transliteration before it gets to |
We are importing the data with Here is the sessionInfo from one student where unicode went wrong. I think the other students with the problem had the same locale.
Here is the session info from a student who had it working as expected
|
I was looking at this again today, and it's an even thornier problem than I first thought. When I tried it on my system (Windows 11 with US English locale), I get a simple lowercase c (see below). But, I think that a workaround can be using the tibble::tibble("Temperature (℃)" = 1) |> janitor::clean_names(replace = c("\u2103" = "deg c")) |> names() degC <- rawToChar(as.raw(c(0xe2, 0x84, 0x83)))
degC
#> [1] "℃"
janitor::make_clean_names(degC)
#> [1] "c" Created on 2022-12-01 with reprex v2.0.2 Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.1 (2022-06-23 ucrt)
#> os Windows 10 x64 (build 22621)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.utf8
#> ctype English_United States.utf8
#> tz America/New_York
#> date 2022-12-01
#> pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.1)
#> cli 3.4.1 2022-09-23 [1] CRAN (R 4.2.1)
#> DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.1)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.1)
#> dplyr 1.0.10 2022-09-01 [1] CRAN (R 4.2.1)
#> evaluate 0.18 2022-11-07 [1] CRAN (R 4.2.2)
#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.1)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.1)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.1)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.1)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.1)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.2.1)
#> htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.2.1)
#> janitor 2.1.0 2021-01-05 [1] CRAN (R 4.2.1)
#> knitr 1.40 2022-08-24 [1] CRAN (R 4.2.1)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.1)
#> lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.2.1)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1)
#> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.1)
#> purrr 0.3.5 2022-10-06 [1] CRAN (R 4.2.1)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.1)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0)
#> R.utils 2.12.0 2022-06-28 [1] CRAN (R 4.2.1)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.1)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.1)
#> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.1)
#> rmarkdown 2.17 2022-10-07 [1] CRAN (R 4.2.1)
#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.1)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
#> snakecase 0.11.0 2019-05-25 [1] CRAN (R 4.2.1)
#> stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.1)
#> stringr 1.4.1 2022-08-20 [1] CRAN (R 4.2.1)
#> styler 1.7.0 2022-03-13 [1] CRAN (R 4.2.1)
#> tibble 3.1.8 2022-07-22 [1] CRAN (R 4.2.1)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.1)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.1)
#> vctrs 0.5.0 2022-10-22 [1] CRAN (R 4.2.2)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.1)
#> xfun 0.34 2022-10-18 [1] CRAN (R 4.2.2)
#> yaml 2.3.6 2022-10-18 [1] CRAN (R 4.2.1)
#>
#> [1] C:/Users/wdenn/AppData/Local/R/win-library/4.2
#> [2] C:/Program Files/R/R-4.2.1/library
#>
#> ────────────────────────────────────────────────────────────────────────────── |
I'm not sure what the best solution within Ideas are welcome. |
This is a problem that affects some of the students with windows computers in my class.
The students are importing an excel file that contains a unicode character \u2103 (℃) in the header row. They are then using
janitor::clean_names()
.For most of the students
janitor::clean_names()
converts the column name to "temperature_c" in both Rstudio and when rendering with quarto.For about 20% of the students,
janitor::clean_names()
converts the "℃" to "temperature_u_00b0_c" (the unicode for "°") in Rstudio but to "temperature_c" when rendered with quarto. This then causes problems with the rest of their code when they render the documentIn both rstudio and quarto the "℃" is being imported correctly as utf-8 and has the same output with charToRaw() - e2 84 83, so it is not an import problem. Somehow janitor is treating the unicode differently depending on how R is being run.
All the affected students are using R4.2.1 with the current version of RStudio on windows. Students might have Norwegian locales - I haven't been able to check that.
Minimal example (but it might work correctly for you)
The text was updated successfully, but these errors were encountered: