explosion spaCy Language Support · Discussions · GitHub

Sort by: Latest activity

Language Support Discussions

Discuss the language data and training models for new languages

Pinned to Language Support

Adding models for new languages master thread
enhancement Feature requests and improvements lang / all Global language data new language Adding support for new languages to spaCy.
ines started Dec 16, 2018 in Language Support

141

Discussions

You must be logged in to vote

French morphologizer mislabeling future, conditional, imperative

KennethBaclawskiJrDialpad started Dec 13, 2024 in Language Support

0
You must be logged in to vote

French tokenization - iconsistent application of exceptions in FR_BASE_EXCEPTIONS & other unexpected tokenization
lang / fr French language data and models feat / tokenizer Feature: Tokenizer
e-nesse started Aug 9, 2021 in Language Support

7
You must be logged in to vote

xx_sent_ud_sm bad sentence split
models Issues related to the statistical models lang / zh Chinese language data and models lang / xx Multi-language data and models feat / senter Feature: Sentence Recognizer
lance0108 started May 18, 2023 in Language Support

2
You must be logged in to vote

Adding a New Language

LilitKharatyan started Aug 5, 2024 in Language Support

0
You must be logged in to vote

New Language: Classical Armenian

LilitKharatyan started May 20, 2024 in Language Support · Closed

1
You must be logged in to vote

Licensing question: SpaCy pipelines vs training datasets

surdina started Jul 23, 2024 in Language Support

0
You must be logged in to vote

Is there a corref model updated with Spacy 3.7?

lingvisa started Jul 16, 2024 in Language Support

5
You must be logged in to vote

Losing POS Tagging & Other Token Attributes when Segmenting with Jieba or Pkuseg
usage General spaCy usage feat / tokenizer Feature: Tokenizer
creolio started Jul 20, 2023 in Language Support

2
You must be logged in to vote

Expected? json data generated from convert conll-u shows unicode?

yosiasz started Jun 23, 2024 in Language Support

0
You must be logged in to vote

Amharic - training model

yosiasz started Jun 17, 2024 in Language Support · Closed

1
You must be logged in to vote

creating Amharic model am_core_web_sm
lang / am Amharic language data and models
yosiasz started Dec 16, 2020 in Language Support

4
You must be logged in to vote

Amharic - አማርኛ (am-et) language support
lang / am Amharic language data and models
yosiasz started Dec 11, 2020 in Language Support

11
You must be logged in to vote

Arabic language support
lang / ar Arabic language data and models
jeknov started Feb 21, 2021 in Language Support

15
You must be logged in to vote

Why is the german word "stark" always recognized as an ADV without a sentiment value?

Diapolo started May 13, 2024 in Language Support

0
You must be logged in to vote

Why does the German sentence tokenizer consider a semicolon a sentence ending?
lang / de German language data and models feat / tokenizer Feature: Tokenizer
TamaraAtanasoska started Feb 26, 2024 in Language Support

2
You must be logged in to vote

Other Languages Support
models Issues related to the statistical models
firqaaa started Feb 13, 2024 in Language Support · Closed

0
You must be logged in to vote

Portuguese words starting with a capital letter are not correctly lemmatized
lang / pt Portuguese language data and models feat / lemmatizer Feature: Rule-based and lookup lemmatization
dcaled started Apr 1, 2021 in Language Support

6
You must be logged in to vote

Adding support for Tibetan in spacy
new language Adding support for new languages to spaCy.
wienergm started Dec 24, 2023 in Language Support

0
You must be logged in to vote

Feedback on alpha Finnish, Korean and Swedish trained pipelines
enhancement Feature requests and improvements lang / ko Korean language data and models lang / sv Swedish language data and models lang / fi Finnish language data and models v3.3 Related to v3.3
adrianeboyd started Apr 5, 2022 in Language Support

16
You must be logged in to vote

English models' Accuracy Evaluation values
lang / en English language data and models
ojo4f3 started Dec 4, 2023 in Language Support

1
You must be logged in to vote

Update russian library
lang / ru Russian language data and models third-party Third-party packages and services feat / lemmatizer Feature: Rule-based and lookup lemmatization
fitwist started Nov 15, 2023 in Language Support

1
You must be logged in to vote

Floret vectors for Italian
training Training and updating models feat / vectors Feature: Word vectors and similarity
darioprencipe started Oct 30, 2023 in Language Support

1
You must be logged in to vote

Improving Bengali Stopwords collection and Exception
lang / bn Bengali language data and models new language Adding support for new languages to spaCy.
Debangan-MishraIIIT started Sep 7, 2023 in Language Support

1
You must be logged in to vote

conected words in Portuguese
lang / pt Portuguese language data and models
ClioBrNl2023 started Aug 22, 2023 in Language Support

1
You must be logged in to vote

Training coreference resolver on Italian Ontonotes produces low scores
training Training and updating models feat / coref Feature: Coreference resolution
ghidav started Aug 14, 2023 in Language Support

3