ArbEngVec

Word Embeddings (WE) are getting increasingly popular and widely applied in many Natural Language Processing (NLP) applications due to their effectiveness in capturing semantic properties of words; Machine Translation (MT), Information Retrieval (IR) and Information Extraction (IE) are among such areas. In this project , we propose an open source ArbEngVec which provides several Arabic-English cross-lingual word embedding models. To train our bilingual models, we use a large dataset with more than 93 million pairs of Arabic-English parallel sentences.

Evaluation

WE perform both extrinsic and intrinsic evaluations for the different word embedding model variants. The extrinsic evaluation assesses the performance of models on the cross-language Semantic Textual Similarity (STS), while the intrinsic evaluation is based on the Word Translation (WT) task.

Models

All model variants with GenSim format can be found here: https://drive.google.com/open?id=1S2ugc8pZshYD3mTAkShKoA7He4Ih4gnS

All model variants with Binary format can be found here: https://drive.google.com/open?id=1guk6kNVoJFuaq1_zFRaEKBPZjz7LhOuj

Visualisation : Random shuffle with SkipGram Model

Citation

In further research usage of this script please use this citation:

@inproceedings{lachraf-etal-2019-arbengvec, title = "{A}rb{E}ng{V}ec : {A}rabic-{E}nglish Cross-Lingual Word Embedding Model", author = "Lachraf, Raki and Nagoudi, El Moatez Billah and Ayachi, Youcef and Abdelali, Ahmed and Schwab, Didier", booktitle = "Proceedings of the Fourth Arabic Natural Language Processing Workshop", month = aug, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/W19-4605", doi = "10.18653/v1/W19-4605", pages = "40--48", }

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
alignments.py		alignments.py
arabic_preprocess.py		arabic_preprocess.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArbEngVec

Evaluation

Models

Visualisation : Random shuffle with SkipGram Model

Citation

About

Releases

Packages

Languages

Nagoudi/ArbEngVec

Folders and files

Latest commit

History

Repository files navigation

ArbEngVec

Evaluation

Models

Visualisation : Random shuffle with SkipGram Model

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages