Skip to content

A language detection model using neural networks to identify different languages from text input

Notifications You must be signed in to change notification settings

cikay/language-detection

Repository files navigation

Neural Network Language Detector

A language detection model using neural networks to identify 10 different languages from text input. Built as a learning project with a dataset of 60,000 sentences, thanks to Mozilla Common Voice. This implementation demonstrates the fundamentals of neural network and text classification.

To run the code follow the steps below

Create and activate virtual environment using pipenv

pipenv shell 

Install dependencies

pipenv install

Open Python shell

python

Run the following code to train the model and test a sample sentence. It will print out the test prediction rate like Epoch 0: 6789 / 12000 for each epoch and lastly for given Kurdish Kurmaji sentence it will print out Kurdish Kurmanji. Note that it takes about 1 minute to print out first epoch results and 20 minutes for completion

from data_loader import load_data
from data_preparer import CharEncoder
from language_detector import LanguageDetector


train_data_path = "./data/train.tsv"
sentences, languages = load_data(train_data_path)
encoder = CharEncoder(sentences)
lang_detector = LanguageDetector(sentences, languages, encoder)
sentence = "Ez ê di vê gotarê da qala ên ku ez guhdar û temaşe dikim bikim."
lang = lang_detector.detect(sentence)

print(lang) # Should print "Kurdish Kurmanji"

To test additional sentences

sentence = "the sentence you want"
lang = lang_detector.detect(sentence)

print(lang)

About

A language detection model using neural networks to identify different languages from text input

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages