Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do we transform separators in multivalue fields #364

Open
MortenHofft opened this issue Nov 8, 2024 · 2 comments
Open

Why do we transform separators in multivalue fields #364

MortenHofft opened this issue Nov 8, 2024 · 2 comments

Comments

@MortenHofft
Copy link
Member

MortenHofft commented Nov 8, 2024

The Dwc specifications says to separate multiple values by | space pipe space.

https://dwc.tdwg.org/list/#dwc_samplingProtocol
the recommended best practice is to separate the values in a list with space vertical bar space ( | ).

But we transform that into just | no spaces. That means we do not follow the standard in downloads, and for URIs where pipes can be part of the value it makes it difficult to split. And for downloads we then make it a semicolon

fragmenthttps://www.ncbi.nlm.nih.gov/nuccore/LC731499 | https://www.ncbi.nlm.nih.gov/nuccore/LC731501
interpreted https://www.ncbi.nlm.nih.gov/nuccore/LC731499|https://www.ncbi.nlm.nih.gov/nuccore/LC731501
in download https://www.ncbi.nlm.nih.gov/nuccore/LC731499;https://www.ncbi.nlm.nih.gov/nuccore/LC731501

Would it not make better sense to keep the formatting the same throughout. | space pipe space

@timrobertson100
Copy link
Member

I can answer the why part - when the code was written that was indeed the recommendation (example) where you will see lots of examples with the following. The bit in the ticks has the space but the text did not.:

... The recommended best practice is to separate the values with a vertical bar (' | ')...

Would it not make better sense to keep the formatting the same throughout. | space pipe space

Yes

@MortenHofft
Copy link
Member Author

MortenHofft commented Nov 11, 2024

Would it be considered a breaking change or a fix? I would argue that it is a fix, but it might still break someones existing scripts.

And I guess there is also the question: should we then stop considering pipe-only a delimiter - which might be used in many records.
Some numbers about how many records that would change would be intersting.

Note: I wrote above that pipes | is allowed in URLs. Apparently that isn't entirely true. It is often considered unsafe it seems. It isn't as such illegal, it is just bad practice and will often fail as many systems consider them illegal or delimiters. Just like we do. So part of my problem (linking associated sequences) could be solved by requiring data providers to encode pipes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants