Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes unicode support on gtfs #15

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Fixes unicode support on gtfs #15

wants to merge 2 commits into from

Conversation

fserb
Copy link

@fserb fserb commented Dec 7, 2010

Hey,
I was trying to use GTFS to parse some utf-8 data, and it was failing with weird UnicodeEncodeError.
I traced this down to two factors:

  1. unmapped_entities.py was converting string attributes to str() (thus trying to convert all unicode to 'ascii').
  2. csv.reader doesn't handle unicode very well.

My first commit changes the test data to have one entry on Stops that has utf-8 characters, hence breaking the tests.

My second commit fixes both issues and makes the tests pass again:
to fix 1, I've made a special case for str on umapped_entities to convert to unicode() instead of str().
to fix 2, I've created a unicode_csv_reader function that wraps around csv.reader/codes.iterdecode. The steps here are a bit annoying: iterdecode() from utf-8, encode it back, so csv.reader is fine with it, get the output from csv.reader and decode it back to utf-8, so we have the final utf-8 output.

thanks for attention,
[]s
F.

@Lawouach
Copy link

Any chance this is fixed at some point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants