Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More than one blank line at the end of CSV file: automate cleanup? #1572

Open
aborruso opened this issue Jun 1, 2024 · 4 comments
Open

More than one blank line at the end of CSV file: automate cleanup? #1572

aborruso opened this issue Jun 1, 2024 · 4 comments

Comments

@aborruso
Copy link
Contributor

aborruso commented Jun 1, 2024

Hi @johnkerl,
I sometimes run into errors like this::

mlr: mlr: CSV header/data length mismatch 2 != 1 at filename tmp.csv row 4.

This error occurs in many cases. Even when in CSV there are two blank lines at the end and not one.

a,b
1,2
4,6


I prefer to attach also a screenshot

image

For very large files, it's something I often miss, in the sense that I can't notice it and look for errors of other kinds.

Do you think it makes sense to introduce a more consistent error message, and/or automatically clean up the blank end lines and if there are more than one reduce them to one?

Thank you

@aborruso aborruso changed the title Blank end rows: automate cleanup? More than one blank line at the end of CSV file: automate cleanup? Jun 1, 2024
@aborruso
Copy link
Contributor Author

@johnkerl If you think it's not worth pursuing, I'll close it.

Thank you always

@johnkerl
Copy link
Owner

@aborruso thanks -- let's keep this open -- it's worth pursuing

@johnkerl
Copy link
Owner

@aborruso there are a few issues:

  • Enabling non-compliant CSV by default is a slippery slope
  • Even as opt-in behavior:
    • If the CSV has 2 or more columns then clearly the entirely empty lines can be ignored on an opt-in basis
    • But if the CSV has 1 column then it's not clear whether the blank line is
      • A row to be ignored
      • A completely legitimate row which has the empty-string value

My thought is to have an opt-in flag but with the caveat to the user that it will "eat" legitimate empty-string final rows in the one-column case ...

@aborruso
Copy link
Contributor Author

But if the CSV has 1 column

This is the problem :(

The only thing I can think of is this: if the CSV has more than one blank line at the end, and we have for these end lines data length mismatch 2 (or more) != 1, keep only a blank line at the end.

It may be risky, though, and it's better to bump with the error and fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants