Support for XML wordnet #632
Replies: 4 comments
-
If there are no licensing problems from using this resource, it would be great to integrate with it. As far as file formats go, mostly spaCy prefers JSON formats to XML, because it's a little bit easier to dump into JSON. I assume it's no problem to translate the format into JSON? This way we don't have to depend on lxml. |
Beta Was this translation helpful? Give feedback.
-
Sure, this is not a problem. I create this WordNet JSON format. Is required for a particular format or syntax directly from XML to JSON will be okay? |
Beta Was this translation helpful? Give feedback.
-
We don't have a format specified yet. I guess just make something you'd want to use? |
Beta Was this translation helpful? Give feedback.
-
I publish plWordNet in JSON format. Due to the fact that a Polish plWordNet important tables such as "pos" and "domain" are in Polish, so the readme-JSON.txt added a short glossary. Now the whole conversion was automated so everything is original. However, if it is necessary I can add a few features that will change the export of emotion words, domains and parts of speech into English. Pack download: http://inder.pl/datasets/plwordnet.tar.gz |
Beta Was this translation helpful? Give feedback.
-
It is in some near future is planned support for WordNet in the XML version?
When you add a new language to spaCy is required WordNet. The problem is that the language is Polish WordNet is a version of XML. In contrast, it has a very interesting advantages. Information at:
178 000 words,
259 000 meanings,
more than 600 000 relationships,
158 000 entries Polish-English,
free license,
the world's largest wordnet
Official site: http://plwordnet.pwr.wroc.pl/wordnet/
Beta Was this translation helpful? Give feedback.
All reactions