CoNLL-U to displaCy treebank conversion #1215
Replies: 6 comments
-
Things to consider:
|
Beta Was this translation helpful? Give feedback.
-
Ah cool, thank you so much for posting your code here! 💯 This actually looks much more compact than I had imagined – so I suppose we could integrate this with v2 pretty easily.
Theoretically, displaCy can render whole documents with hundreds of sentences in one go (anything that's in a docs1 = [Doc, Doc, Doc, Doc] # Doc objects
docs2 = [dict(), dict(), dict(), dict()] # dicts describing words/arcs
# start the web server and serve the HTML
displacy.serve(*docs1)
displacy.serve(*docs2, manual=True)
# get the markup to export to a file etc.
html = displacy.render(*docs1, page=True)
html = displacy.render(*docs2, page=True, manual=True)
This should be no problem for displaCy – at least if you use the "manual" mode with words/arcs dict, or pass in a modified The JS version of displaCy had a nice feature for assigning custom styles to arcs (e.g. But I'm not sure if this would make sense for your use case. Anyway, thanks again for the great work – I'll start integrating it on the v2 branch and keep you updated! |
Beta Was this translation helpful? Give feedback.
-
No problem - least I could do considering you're sharing such a great piece of software. :) Thanks for addressing the other considerations - sounds good! I'll have another go at getting the nightly to work, even if it means running it in Docker. I should probably submit an issue about the Visual Studio compiler issues I've been having when installing as I saw someone else posting a similar error message recently. |
Beta Was this translation helpful? Give feedback.
-
BTW this would be very useful for my purposes:
You should definitely bring that back :) |
Beta Was this translation helpful? Give feedback.
-
One other thing (sorry!) - there's a flaw in my code. It should convert the CoNLL-U null value Come to think of it, that's probably why my attempts to render partial annotations didn't work. |
Beta Was this translation helpful? Give feedback.
-
Small tweak of the main function to allow for incomplete annotations to be rendered by displaCy:
It works by not creating or adding arcs where the HEAD is "_" . |
Beta Was this translation helpful? Give feedback.
-
CoNLL-U is the format used by the Universal Dependencies initiative to annotate dependency treebanks (http://universaldependencies.org/format.html). A large number of treebanks for many different languages are available in the CoNLL-U format. However, currently displaCy cannot parse the CoNLL format to visualize those trees, and it would be nice if it could.
The following code converts a single sentence from the CoNLL format to the json format used by displaCy:
Example output:
Apologies that this isn't submitted as a Pull Request - I've not been able to get Spacy 2.0 Alpha up and running on my Windows machines. I've tested the output in displaCy using Jupyter and compared the resulting visualizations with the brat visualizations here:
http://bionlp-www.utu.fi/dep_search/
For example:
So far, they seem to match:
However, further testing is advisable.
CoNLL-U for the above examples:
Beta Was this translation helpful? Give feedback.
All reactions