My implementation of the Transformer model, inspired by the paper "Attention Is All You Need", follows the original architecture while maintaining modularity for better readability and extensibility.
This project is built using PyTorch and includes separate components for the encoder, decoder, attention mechanisms, and embeddings.