-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify support (or not) for character encodings other than UTF-8 #42
Comments
I think my intention was to support text encodings that are "ASCII compatible," which should include Latin-1. For example, in almost all cases from Of course, you did pick out a few! In particular:
|
IMO
but I guess you had specific motivations for latin1 support? |
@eddy-geek I don't really understand what's motivating your comment. CSV itself doesn't have a specified character encoding, and most CSV parsers are written to be ASCII compatible. ASCII compatibility is the goal, and as a result, encodings like latin-1 wind up being supported. This is important because CSV data is often quite messy, and there's nothing worse than failing to read CSV data because of a character encoding issue. This issue is basically "fix a few places in |
Ok I see, sorry for the noise
…On 12 Dec 2016 5:56 pm, "Andrew Gallant" ***@***.***> wrote:
@eddy-geek <https://github.com/eddy-geek> I don't really understand
what's motivating your comment. CSV itself doesn't have a specified
character encoding, and most CSV parsers are written to be *ASCII
compatible*. ASCII compatibility is the goal, and as a result, encodings
like latin-1 wind up being supported. This is important because CSV data is
often quite messy, and there's nothing worse than failing to read CSV data
because of a character encoding issue.
This issue is basically "fix a few places in xsv where UTF-8 is assumed."
That's it. Nothing more.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#42 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACpOGRdfgX_9dVE-Ti8T43HGpGKtXvUsks5rHXy4gaJpZM4J9qb5>
.
|
For the same reasons as BurntSushi/xsv#42 we should only support UTF-8 and other encodings should be converted to UTF-8 before processing.
What about UTF-16, UTF-16BE, UTF-16LE ? |
Not supported. |
The documentation in the
README.md
doesn't explain what is xsv's support or policy for character encodings. I think it really ought to.Looking through the code for xsv and the csv crate, it looks like there isn't a consistent policy:
byte_records()
function.xsv search
, however, uses therecords()
function, which interprets the data as UTF-8.str::from_utf8()
on byte data.select
module usesString
to represent field names, which is UTF-8. What happens when you try toxsv select
from a file that has Latin-1 field names?The text was updated successfully, but these errors were encountered: