Langram - the most accurate language detection library

314 ScriptLanguages (187 models + 127 single language scripts)

One language can be written in multiple scripts, so it will be detected as a different ScriptLanguage (language + script)

Uses alphabet_detector as a word separator + language prefilter.

Based on chars (1 - 5) and 1 word n-gram language model modified algorithm.

ModelsStorage with all models preloaded uses around 4.1GB of RAM (2.4GB using max_trigrams). There can be a way (unimplemented) to unload each language model after use, it will work slower but will use around 300MB of RAM. Or maybe can use some DB for models storage on disk, rather than a HashMap in RAM.

This library is a complete rewrite of Lingua: much faster, more accuracy, more languages, etc.

Accuracy report

Comparison with other language detectors

Setup

To use it, you need to patch langram_models in Cargo.toml:

From Git:

[patch.crates-io]
langram_models = { git = "https://github.com/RoDmitry/langram_models.git" }

From predownloaded copy (langram_models):

[patch.crates-io]
langram_models = { path = "../langram_models" }

Which is more advanced and allows you to remove model ngrams, so that final executable would be lighter.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
bench		bench
src		src
tests		tests
train		train
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Langram - the most accurate language detection library

314 ScriptLanguages (187 models + 127 single language scripts)

Setup

About

Licenses found

Uh oh!

Uh oh!

Languages

License

Licenses found

RoDmitry/langram

Folders and files

Latest commit

History

Repository files navigation

Langram - the most accurate language detection library

314 ScriptLanguages (187 models + 127 single language scripts)

Setup

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages