add encoding support for CSV files #767
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
name: Pull Request
about: Create a pull request to contribute to the project
title: 'Fix CSV encoding error for ISO-8859-1/Latin-1 encoded files'
labels: 'bug-fix, csv, encoding'
Related Issue
Fixes #649
Description of Changes
This PR adds encoding support to CSV file loading to handle ISO-8859-1 (Latin-1) encoded files that were previously failing with "Invalid Input Error: CSV Error". The changes include:
encoding
parameter toCSVConfig
with default"utf-8"
for backward compatibilityCSVSource
to pass encoding parameter to DuckDB'sread_csv_auto
functionThe fix maintains full backward compatibility while enabling support for legacy encoded files that contain special characters.
Type of Change
Testing
simple_encoding_test.py
) that verifies:Usage Example
To load an ISO-8859-1 encoded CSV file, add to your
preswald.toml
: