GitHub - dhruvjain1999/NLP_French_to_english_translation

Contributors

Dhruv Jain: Feature engineering, model development, summary, future work.
Sunday Okechukwu: Data gathering, data cleaning, data pre-processing, evaluation, results.

Machine Translation

What is Machine Translation?

Machine Translation refers to the automated process of translating text or speech from one language (source language) to another language (target language). For example, translating from French to English or Spanish to English.

Why Machine Translation?

Internal Communication: Helps companies with operations in multiple countries manage communication across languages.
Data Analysis: Enables analysis of large amounts of content from social media and websites in different languages for insights.
Online Customer Service: Supports customer service by translating requests and responses accurately.
Legal Research: Aids in preparing legal documents in different languages.

Common Types of MT

Rule-based MT
Statistical MT
Neural MT

Project Goal

The goal of this project is to build a deep neural network for Neural Machine Translation, accepting French sentences as input and returning English translations.

Dataset

Dataset: Available here
French-English dataset with 217,975 sentences

Main Stages of the Project

Preprocessing: Normalizing case, removing punctuation and non-alphabetic characters, tokenization, and padding.
Feature Extraction: Training word embeddings using Keras embedding layer.
Modeling: Seq2Seq model with LSTM layers for both encoder and decoder.
Evaluations: BLEU score for translation quality.
Summary and Future Work: Suggestions for improvement like increasing LSTM layers, using pre-trained embeddings, and implementing attention mechanism.

Preprocessing

Normalize case to lowercase and Unicode normalization.
Remove punctuation and non-alphabetic characters.
Tokenize words into IDs and add padding for fixed-length sequences.

Feature Extraction

Train word embeddings using the Keras embedding layer.
Convert words into fixed-length vectors.

Model Development

Use a Seq2Seq model with LSTM layers for both encoder and decoder.
Encoder processes input sequence into a context vector.
Decoder predicts target sequence based on the context vector.

Evaluations and Results

BLEU-4 Score: 0.849

Summary and Future Works

Improve the model by increasing the number of LSTM layers.
Use pre-trained embedding layers like word2vec or GloVe.
Implement attention mechanism for better performance.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
NLP project overleaf		NLP project overleaf
Nlp_technical_project_final overlear latext		Nlp_technical_project_final overlear latext
images		images
resources		resources
Final_report.pdf		Final_report.pdf
NLP_Final_Project.ipynb		NLP_Final_Project.ipynb
Presentation Slides.ipynb		Presentation Slides.ipynb
README.md		README.md
new_nlp_project.ipynb		new_nlp_project.ipynb
small_vocab_en		small_vocab_en
small_vocab_fr		small_vocab_fr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Contributors

Machine Translation

What is Machine Translation?

Why Machine Translation?

Common Types of MT

Project Goal

Dataset

Main Stages of the Project

Preprocessing

Feature Extraction

Model Development

Evaluations and Results

Summary and Future Works

About

Uh oh!

Releases

Packages

Languages

dhruvjain1999/NLP_French_to_english_translation

Folders and files

Latest commit

History

Repository files navigation

Contributors

Machine Translation

What is Machine Translation?

Why Machine Translation?

Common Types of MT

Project Goal

Dataset

Main Stages of the Project

Preprocessing

Feature Extraction

Model Development

Evaluations and Results

Summary and Future Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages