add encoding support for CSV files #767

MonRani · 2025-06-27T19:44:45Z

name: Pull Request
about: Create a pull request to contribute to the project
title: 'Fix CSV encoding error for ISO-8859-1/Latin-1 encoded files'
labels: 'bug-fix, csv, encoding'

Related Issue
Fixes #649

Description of Changes
This PR adds encoding support to CSV file loading to handle ISO-8859-1 (Latin-1) encoded files that were previously failing with "Invalid Input Error: CSV Error". The changes include:

Added optional encoding parameter to CSVConfig with default "utf-8" for backward compatibility
Updated CSVSource to pass encoding parameter to DuckDB's read_csv_auto function
Modified configuration parsing to support encoding from TOML files
Enhanced documentation with encoding parameter examples and usage instructions
Added comprehensive test suite to verify encoding support

The fix maintains full backward compatibility while enabling support for legacy encoded files that contain special characters.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
New example
Test improvement

Testing

Created test suite (simple_encoding_test.py) that verifies:
- Default UTF-8 encoding works with regular files
- UTF-8 encoding fails appropriately with Latin-1 files
- Latin-1 encoding successfully loads Latin-1 files with special characters
Verified backward compatibility with existing UTF-8 encoded files
Tested with files containing special characters (José, François, Müller, etc.)

Usage Example
To load an ISO-8859-1 encoded CSV file, add to your preswald.toml:

[data.mobile_dataset]
type = "csv"
path = "data/mobile_dataset.csv"
encoding = "latin-1"

add encoding support for CSV files

56fd523

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add encoding support for CSV files #767

add encoding support for CSV files #767

Uh oh!

MonRani commented Jun 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

add encoding support for CSV files #767

Are you sure you want to change the base?

add encoding support for CSV files #767

Uh oh!

Conversation

MonRani commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

name: Pull Request about: Create a pull request to contribute to the project title: 'Fix CSV encoding error for ISO-8859-1/Latin-1 encoded files' labels: 'bug-fix, csv, encoding'

Uh oh!

Uh oh!

MonRani commented Jun 27, 2025 •

edited

Loading

name: Pull Request
about: Create a pull request to contribute to the project
title: 'Fix CSV encoding error for ISO-8859-1/Latin-1 encoded files'
labels: 'bug-fix, csv, encoding'