An encoder-decoder transformer model implemented from scratch, based on the paper "Attention Is All You Need". The goal is to understand the core components of the Transformer architecture by building it step by step.
The model was trained on parallel English-Portuguese text to perform machine translation using the OPUS Books dataset.
