The project is about translating English sentences to Hindi sentences using Transformers.
I have used Tensorflow for the project and this article has helped to understand its implementation.
Dataset used can be found here.
It contains around 100K pairs of English and Hindi sentences.
First I have done basic text processing which includes things like lowering of sentences, removing any URLs, removing digits etc.
[Start] and [End] tags are then added to Hindi Sentences.
TextVectorization from keras is used to create sentence vectors.
The vocabulary size is 20000 and sentence length is 20.
Here 80K samples are taken for training each with a length <= 20 words.
Here in Transformer model I have used only 1 encoder and 1 decoder.
The Embedding dim is 128, no. of heads in MultiHeadAttention is 10, latent dim is 2048 which is used in Feed Forward Network with dropout of 0.2.
The Epochs are set to 50 with Optimizer as Adam, Loss as sparse_categorical_crossentropy and Metric as accuracy.
Two callback functions Reduce LR on Plateau and Early Stopping are also used.
After evaluating on 500 samples the BLEU score was 24.5.
The BLEU score is not that great but still I learned a lot about Transformer.