-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
explosion spaCy Language-support Discussions
Sort by:
Latest activity
Categories, most helpful, and community links
Categories
Community links
🌍 Language Support Discussions
Discuss the language data and training models for new languages
Pinned to Language Support
-
🌍 Adding models for new languages master thread
enhancementFeature requests and improvements lang / allGlobal language data new languageAdding support for new languages to spaCy.
Discussions
-
You must be logged in to vote 🌍 -
You must be logged in to vote 🌍 French tokenization - iconsistent application of exceptions in FR_BASE_EXCEPTIONS & other unexpected tokenization
lang / frFrench language data and models feat / tokenizerFeature: Tokenizer -
You must be logged in to vote 🌍 xx_sent_ud_sm bad sentence split
modelsIssues related to the statistical models lang / zhChinese language data and models lang / xxMulti-language data and models feat / senterFeature: Sentence Recognizer -
You must be logged in to vote 🌍 -
You must be logged in to vote 🌍 -
You must be logged in to vote 🌍 -
You must be logged in to vote 🌍 -
You must be logged in to vote 🌍 Losing POS Tagging & Other Token Attributes when Segmenting with Jieba or Pkuseg
usageGeneral spaCy usage feat / tokenizerFeature: Tokenizer -
You must be logged in to vote 🌍 -
You must be logged in to vote 🌍 -
You must be logged in to vote 🌍 creating Amharic model am_core_web_sm
lang / amAmharic language data and models -
You must be logged in to vote 🌍 Amharic - አማርኛ (am-et) language support
lang / amAmharic language data and models -
You must be logged in to vote 🌍 Arabic language support
lang / arArabic language data and models -
You must be logged in to vote 🌍 -
You must be logged in to vote 🌍 Why does the German sentence tokenizer consider a semicolon a sentence ending?
lang / deGerman language data and models feat / tokenizerFeature: Tokenizer -
You must be logged in to vote 🌍 Other Languages Support
modelsIssues related to the statistical models -
You must be logged in to vote 🌍 Portuguese words starting with a capital letter are not correctly lemmatized
lang / ptPortuguese language data and models feat / lemmatizerFeature: Rule-based and lookup lemmatization -
You must be logged in to vote 🌍 Adding support for Tibetan in spacy
new languageAdding support for new languages to spaCy. -
You must be logged in to vote 🌍 Feedback on alpha Finnish, Korean and Swedish trained pipelines
enhancementFeature requests and improvements lang / koKorean language data and models lang / svSwedish language data and models lang / fiFinnish language data and models v3.3Related to v3.3 -
You must be logged in to vote 🌍 English models' Accuracy Evaluation values
lang / enEnglish language data and models -
You must be logged in to vote 🌍 Update russian library
lang / ruRussian language data and models third-partyThird-party packages and services feat / lemmatizerFeature: Rule-based and lookup lemmatization -
You must be logged in to vote 🌍 Floret vectors for Italian
trainingTraining and updating models feat / vectorsFeature: Word vectors and similarity -
You must be logged in to vote 🌍 Improving Bengali Stopwords collection and Exception
lang / bnBengali language data and models new languageAdding support for new languages to spaCy. -
You must be logged in to vote 🌍 conected words in Portuguese
lang / ptPortuguese language data and models -
You must be logged in to vote 🌍 Training coreference resolver on Italian Ontonotes produces low scores
trainingTraining and updating models feat / corefFeature: Coreference resolution