- New model architecture introduced in Attention is All You Need
- Understands relationships between words in a document
- Applies attention weight between any inputs ("self-attention")
Source: Attention is All You Need
- Transformer trained with huge amounts of data.
- Neural Network with millions or billions of weights.
Back to the Notebook.