Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multistring delimiters #28

Open
kasrad opened this issue Jun 28, 2020 · 3 comments
Open

Multistring delimiters #28

kasrad opened this issue Jun 28, 2020 · 3 comments

Comments

@kasrad
Copy link

kasrad commented Jun 28, 2020

This package currently does not allow for using multi-character delimiters (eg. ||). Is there a reason for it? As I needed it, I've prepared a fix for this and I can create a pull request for it.

@chamkank
Copy link
Owner

chamkank commented Jun 28, 2020

Answered in #29 (comment)
In your example, repeating characters in a delimiter (like ||, |||, ||||) happens to work already if you just use | as a delimiter. But it would be misleading to say that hone explicitly supports multi-character delimiters because hone doesn't work for cases where the characters in a multi-character delimiter are not the same (ex: |\, >|, etc).

If you'd like to submit a PR for this, I'd be more than happy to review it.

@kasrad
Copy link
Author

kasrad commented Jul 1, 2020

Hi, in my testing I ran into an issue with the behaviour for multiple single string delimiters next to each other - eg. level1||level2, level1||level3 is nested as level1:{level2:value, level3:value} (given the delimiters is ['_']). Is this something you intend to keep? I'd say it might be better to forbid such column names, as it might be misleading (I, for example, expected to get an empty key and therefore an error). Asking as my implementation of multistring delimiters currently forbids column names like ^ and I don't want to open a pull request before I make sure. Thanks!

@chamkank
Copy link
Owner

chamkank commented Jul 3, 2020

Apologies for the delayed response!

I think it would be best to avoid throwing any errors (this tool should be able to handle any CSV file containing column names). Given the single-character delimiter |, I think the expected behaviour would be that we split on the first occurrence of the delimiter. So level1||level2, level1||level3 would give us level1| : { |level2 : value, |level3 : value } if the delimiter is |. If the user wants to generate level1 : { level2 : value, level3 : value } (which as you mentioned is what happens right now if you use the delimiter |), they should use the multi-character delimiter || instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants