column_names ignored by load_dataset() when loading CSV file #7077

luismsgomes · 2024-07-26T14:18:04Z

Describe the bug

load_dataset() ignores the column_names kwarg when loading a CSV file. Instead, it uses whatever values are on the first line of the file.

Steps to reproduce the bug

Call load_dataset to load data from a CSV file and specify column_names kwarg.

Expected behavior

The resulting dataset should have the specified column names and the first line of the file should be considered as data values.

Environment info

datasets version: 2.20.0
Platform: Linux-5.10.0-30-cloud-amd64-x86_64-with-glibc2.31
Python version: 3.9.2
huggingface_hub version: 0.24.2
PyArrow version: 17.0.0
Pandas version: 2.2.2
fsspec version: 2024.5.0

The text was updated successfully, but these errors were encountered:

albertvillanova · 2024-07-30T07:52:25Z

I confirm that column_names values are not copied to names variable because in this case CsvConfig.__post_init__ is not called: CsvConfig is instantiated with default values and afterwards the config_kwargs are used to overwrite its attributes.

@luismsgomes in the meantime, you can avoid the bug if you pass names instead of column_names.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

column_names ignored by load_dataset() when loading CSV file #7077

column_names ignored by load_dataset() when loading CSV file #7077

luismsgomes commented Jul 26, 2024

albertvillanova commented Jul 30, 2024

column_names ignored by load_dataset() when loading CSV file #7077

column_names ignored by load_dataset() when loading CSV file #7077

Comments

luismsgomes commented Jul 26, 2024

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

albertvillanova commented Jul 30, 2024