🚀 Welcome to Decoding Attention

An interactive, code-first tour of Transformer inference created for developers who have never opened a machine learning textbook.

Hover, scrub, and step through a real model to see exactly what happens between a keystroke and the next predicted token.

Getting Started

Prerequisites

Install uv
Check out this repository

Run

uv python install 3.12
uv sync
uv run streamlit run main.py

🎯 Goals and Non-goals

✅ What we'll do:

Build a Transformer model architecture using PyTorch and existing functions
Infer the next token using pre-trained model weights
Interactively visualize each step so you can intuitively understand how Transformers work

❌ What we won't cover:

Training models — We focus on inference to understand the core logic (we'll skip backpropagation, loss functions, optimization, etc.)
Implementing from scratch — We use existing implementations of complex parts (RMSNorm, SwiGLU, RoPE, etc.) and focus on concepts instead
If you feel this course is boring, you already know about Transformers! Go to other advanced courses like those in the Acknowledgements section.

📖 Learning Path

📚 Chapter 1 — Tokenization & Sampling ✅ Available Now

What is tokenization and why do we need it? (BPE)
How does a Transformer work at a high level?
How to convert Transformer outputs (logits) to token probabilities? (Temperature, Top-K, Top-P, Min-P, Softmax)
How to sample the next token from probabilities?
How does autoregressive generation create new text?

💎 Chapter 2 — Embedding 🚧 Coming Soon

What are embeddings and why are they important?
How to convert tokens to embedding vectors?
How to convert embedding vectors back to token logits?

🧠 Chapter 3 — Neural Networks 🚧 Coming Soon

What is a neural network? (Perceptron and MLP)
Activation functions (SwiGLU)
Normalization (RMSNorm)

🎯 Chapter 4 — Attention 🚧 Coming Soon

Scaled dot-product attention mechanism
Causal masks for autoregressive generation
Multi-head attention (GQA)

🏗️ Chapter 5 — Complete Transformer 🚧 Coming Soon

Positional encoding (RoPE)
Residual connections
Putting all the pieces together!

Author

Ryosuke Iwanaga / OpsBR Software Technology Inc.

Why did I build this course?

I've been working in the software industry for ~15 years, spanning datacenter operations, database administration, software engineering, and sales engineering. My expertise is in distributed systems, cloud computing, and DevOps/SRE, but I had barely touched machine learning or AI until very recently.

In 2025, I decided to switch my career to AI engineering completely. Luckily, I learned a bit about machine learning in college 20 years ago, so I was able to self-learn Transformers and related topics by watching the best online courses. See this blog post for more details.

During my self-learning, I found that most online courses are too difficult for beginners, especially those who have never opened a machine learning textbook—like software engineers who have no CS background. I believe most of them will start working with AI very soon, so I want to help them understand AI in some depth. Calling LLM APIs isn't enough to understand and predict what happens and what will change in the future. In my opinion, understanding the core logic of Transformers is key, just like understanding the core logic of operating systems or CPU, etc. is key to becoming a good software engineer.

This course is actually my re-learning journey of Transformers. I'll try to explain Transformers as simply as possible by re-implementing an existing model and providing many interactive visualizations. I hope this will help you understand Transformers and become a good AI engineer.

Related works

Other visualization contents of Transformer must help you to understand more details about Transformer. I highly recommend them to walkthrough.

Transformer Explainer

Live demo

LLM Visualization

Live demo

Acknowledgements

Stanford CS336: Language Modeling from Scratch

Spring 2025 / YouTube

I've been heavily inspired by this course, which is the best course for understanding Transformers and language modeling from scratch. I highly recommend it to anyone who wants to learn Transformers in depth.

Also, their Python-based lecture notes inspired me to create this course to be interactive and visual-heavy.

Stanford CS224N: Natural Language Processing with Deep Learning

YouTube

As I hadn't had any NLP experience before, I also watched this course to understand the basics of NLP. This course is great for understanding the foundations of NLP and how Transformers fit into the larger picture, including the history of NLP. If you need an NLP complement to CS336, I highly recommend this course as well.

Deep Dive into LLMs like ChatGPT by Andrej Karpathy

YouTube

This 3.5-hour video by the legendary Andrej Karpathy is a fantastic deep dive into LLMs, covering everything from the basics to advanced topics. It's a great resource for anyone who wants to understand the inner workings of LLMs in detail.

Qwen3

Blog / Hugging Face

Qwen3 is an open-source language model that provides a great starting point for understanding Transformers. It has a well-documented architecture and is easy to use with PyTorch thanks to Hugging Face's Transformers library. I used Qwen3 as the base model for this course.

License and Repository

This project is licensed under the Apache-2.0 License.

The source code is available on GitHub.

If you want, you can cite this work with:

@misc{Iwanaga2025DecodingAttention,
  author        = {Iwanaga, Ryosuke},
  title         = {Decoding Attention: An Interactive Guide of {Transformers} for Software Engineers},
  url           = {https://github.com/opsbr/decoding-attention},
  year          = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
chapter1		chapter1
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
attention.png		attention.png
compose.yaml		compose.yaml
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Welcome to Decoding Attention

Getting Started

Prerequisites

Run

🎯 Goals and Non-goals

✅ What we'll do:

❌ What we won't cover:

📖 Learning Path

📚 Chapter 1 — Tokenization & Sampling ✅ Available Now

💎 Chapter 2 — Embedding 🚧 Coming Soon

🧠 Chapter 3 — Neural Networks 🚧 Coming Soon

🎯 Chapter 4 — Attention 🚧 Coming Soon

🏗️ Chapter 5 — Complete Transformer 🚧 Coming Soon

Author

Related works

Transformer Explainer

LLM Visualization

Acknowledgements

Stanford CS336: Language Modeling from Scratch

Stanford CS224N: Natural Language Processing with Deep Learning

Deep Dive into LLMs like ChatGPT by Andrej Karpathy

Qwen3

License and Repository

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

License

opsbr/decoding-attention

Folders and files

Latest commit

History

Repository files navigation

🚀 Welcome to Decoding Attention

Getting Started

Prerequisites

Run

🎯 Goals and Non-goals

✅ What we'll do:

❌ What we won't cover:

📖 Learning Path

📚 Chapter 1 — Tokenization & Sampling ✅ Available Now

💎 Chapter 2 — Embedding 🚧 Coming Soon

🧠 Chapter 3 — Neural Networks 🚧 Coming Soon

🎯 Chapter 4 — Attention 🚧 Coming Soon

🏗️ Chapter 5 — Complete Transformer 🚧 Coming Soon

Author

Related works

Transformer Explainer

LLM Visualization

Acknowledgements

Stanford CS336: Language Modeling from Scratch

Stanford CS224N: Natural Language Processing with Deep Learning

Deep Dive into LLMs like ChatGPT by Andrej Karpathy

Qwen3

License and Repository

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages