Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

column_names ignored by load_dataset() when loading CSV file #7077

Open
luismsgomes opened this issue Jul 26, 2024 · 1 comment
Open

column_names ignored by load_dataset() when loading CSV file #7077

luismsgomes opened this issue Jul 26, 2024 · 1 comment

Comments

@luismsgomes
Copy link

Describe the bug

load_dataset() ignores the column_names kwarg when loading a CSV file. Instead, it uses whatever values are on the first line of the file.

Steps to reproduce the bug

Call load_dataset to load data from a CSV file and specify column_names kwarg.

Expected behavior

The resulting dataset should have the specified column names and the first line of the file should be considered as data values.

Environment info

  • datasets version: 2.20.0
  • Platform: Linux-5.10.0-30-cloud-amd64-x86_64-with-glibc2.31
  • Python version: 3.9.2
  • huggingface_hub version: 0.24.2
  • PyArrow version: 17.0.0
  • Pandas version: 2.2.2
  • fsspec version: 2024.5.0
@albertvillanova
Copy link
Member

I confirm that column_names values are not copied to names variable because in this case CsvConfig.__post_init__ is not called: CsvConfig is instantiated with default values and afterwards the config_kwargs are used to overwrite its attributes.

@luismsgomes in the meantime, you can avoid the bug if you pass names instead of column_names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants