Skip to content
This repository has been archived by the owner on Aug 26, 2022. It is now read-only.

Newlines in CSV files fail validation #77

Open
MattBlissett opened this issue Nov 28, 2018 · 1 comment
Open

Newlines in CSV files fail validation #77

MattBlissett opened this issue Nov 28, 2018 · 1 comment

Comments

@MattBlissett
Copy link
Member

DWCA containing newlines (correctly done in CSV) doesn't validate correctly, due to the newlines.

It is ingested correctly.

PlantTracker_data_from_2012_onwards.zip

https://www.gbif-uat.org/dataset/87a12234-2fae-4075-860a-f60ba007e5e2 if it's still there as a test.

@MattBlissett
Copy link
Member Author

This is probably the line-based (rather than record-based, respecting new lines) deduplication method in FileBashUtils.

I think we have a risky optimization here; we count duplicates a different way in the crawler, which (I think) handles quoted newlines.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant