R function set_standard_names() does not remove special characters #1

mlaunois · 2019-03-01T18:53:42Z

English version

I'm working on a dataset about the ISF (Impôt de Solidarité sur la Fortune) in France in 2017 (the file attached below). When I try to read that file using the ISO-8859-15 using the following R code:

isfdata = read_csv(
  file = "isfdata-2017.csv",
  locale = locale(
    date_format = "%d/%m/%Y",
    time_format = "%H:%m:%s",
    encoding = "ISO-8859-15"
  )) %>% set_standard_names()

I end up with a tibble containing a weird impazt_moyen_en_a\u0082¬ column.
As defined in this page, the character \u0082, named in the Unicode standard BREAK PERMITTED HERE, is a control character. The last character is also embarrassing, but this one has been inserted by read_csv.

What should I do? I cannot use the column name at all with dplyr functions and similar, even with quoting. I had to modify the file by hand to remove those special characters.

Version française

Je travaille sur des données concernant l'Impôt de Solidarité sur la Fortune en 2017 (le fichier que j'ai attaché). Quand j'essaie de lire le fichier avec l'encodage ISO-8859-15 comme ceci :

isfdata = read_csv(
  file = "isfdata-2017.csv",
  locale = locale(
    date_format = "%d/%m/%Y",
    time_format = "%H:%m:%s",
    encoding = "ISO-8859-15"
  )) %>% set_standard_names()

Je termine avec un tibble contenant une colonne bizarre nommée impazt_moyen_en_a\u0082¬.
Comme défini ici, \u0082, ou BREAK PERMITTED HERE dans le standard Unicode est un caractère de contrôle. Le dernier caractère dans la chaîne est également embarassante...

Que devrais-je faire ? ... Je ne peux pas utiliser ce nom avec les fonctions de dplyr et similaires, même en quotant...
Merci encore d'avance !

isfdata-2017.txt

The text was updated successfully, but these errors were encountered:

mlaunois · 2019-03-02T20:41:52Z

Don't worry, this is due to a bug in readr: tidyverse/readr#974

The str_standardize function fails to remove Unicode escape characters from column names, once the bug. It is relevant to close the bug which is absolutely not caused by tricky?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R function set_standard_names() does not remove special characters #1

R function set_standard_names() does not remove special characters #1

mlaunois commented Mar 1, 2019

mlaunois commented Mar 2, 2019

R function set_standard_names() does not remove special characters #1

R function set_standard_names() does not remove special characters #1

Comments

mlaunois commented Mar 1, 2019

mlaunois commented Mar 2, 2019