Training a Turkish model #2617
Replies: 4 comments
-
The Turkish language data has been making good progress, but we haven't tried training any models yet, and I haven't heard of any experiments from the community either. For #1490, I specifically selected languages for which Universal Dependencies has published data with suitable licenses. See here for the Turkish treebank: https://github.com/UniversalDependencies/UD_Turkish-IMST This section in the docs has an example of using spaCy's converter and |
Beta Was this translation helpful? Give feedback.
-
Hi, I have been working with Turkish text classification. for a few years. All of it is in R language #rstats Saw spaCy has great models for major languages, why not train one for Turkish. I can contribute, |
Beta Was this translation helpful? Give feedback.
-
@selcukakbas Training requires an annotated corpus -- we need examples of the words in context. Just the list of words and possible tags isn't enough. We've got a fair few people working with Turkish to various degrees, so I expect support to steadily improve. Turkish is a relatively difficult language though, as the morphology is quite rich, which spaCy currently doesn't do a great job on. |
Beta Was this translation helpful? Give feedback.
-
Hi all, I have an oncoming conference paper on self-attentive subword based neural Turkish POS tagger. I'll make an individiual repo, code will be in PyTorch. Once it's ready I'll try to integrate to SpaCy. I also write blog posts on Turkish from time to time: For any questions please feel free to ping me! |
Beta Was this translation helpful? Give feedback.
-
Basically what the titles suggests, in Issue #1490, there seems to have steps but no clue on having a constructed language model.
Could use some guidance if it has been used or trained..
Thanks
Beta Was this translation helpful? Give feedback.
All reactions