CoNLL-U output #2912
Replies: 6 comments
-
Ah, great snippet! I would really like to see this built-in as well (using CoNLL-U and not older versions of the standard). CoNLL-U is a lightweight format and easy enough to parse. The comment on your gist is very relevant, though. Multi-word or multi-token units should get the attention they deserve. To be fair, I am not sure how spaCy handles such units by itself. From a programming perspective, I see some possible improvements for a Python 3 environment. I'd argue in favor of using For now, however, I am eager to hear what the maintainers think. Perhaps it is better to turn that gist into a repo? |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot for your positive reaction and your improvement suggestions, which I'll incorporate as soon as I find a bit of time. |
Beta Was this translation helpful? Give feedback.
-
I think having a dedicated repo allows us for easier collaboration. I don't think gists are really suited for that. But you can do whatever seems easiest for you to maintain the code of course. |
Beta Was this translation helpful? Give feedback.
-
Good point; I've just created it. |
Beta Was this translation helpful? Give feedback.
-
@rgalhama This is really nice, thanks for sharing! I've added the repo to the spaCy Universe so others can find it more easily. See here: |
Beta Was this translation helpful? Give feedback.
-
That's great, thanks!! |
Beta Was this translation helpful? Give feedback.
-
Feature description
It would be useful to have the parsed output of Spacy into CoNLL-U format (as asked already in automatically closed issue #533).
I have created a script for that (see https://gist.github.com/rgalhama/8beb48d21bcbc86982f97ac9c3a28f97 ) but it would be nice to have it integrated in Spacy as an option.
Could the feature be a custom component or spaCy plugin?
If so, we will tag it as
project idea
so other users can take it on.Beta Was this translation helpful? Give feedback.
All reactions