Build an LLM from scratch with MAX

A guided tour of a complete GPT-2 implementation using Modular's MAX platform. Each section walks through the code in gpt2.py and explains what it does and why—from model configuration through streaming text generation.

What you'll learn

Transformer architecture: Every component of GPT-2, explained through working code
MAX Python API: How MAX's experimental.nn builds and compiles neural networks
Inference patterns: Weight loading, lazy initialization, model compilation, and autoregressive generation

Quick start

Prerequisites

Pixi package manager
Basic understanding of neural networks
You'll need to meet the MAX system requirements

Installation

git clone https://github.com/modular/max-llm-book
cd max-llm-book
pixi install

Run the model

pixi run gpt2

This downloads the pretrained GPT-2 weights from HuggingFace, compiles the model, and starts an interactive prompt where you can enter text and see generated completions.

Additional modes:

pixi run gpt2 -- --prompt "Once upon a time"   # single generation, then exit
pixi run gpt2 -- --chat                         # streaming multi-turn chat
pixi run gpt2 -- --benchmark                    # tokens/sec benchmark

Read the book

pixi run book

Or read it online at llm.modular.com.

What the book covers

The tutorial walks through gpt2.py section by section:

Section	Topic	What you'll learn
1	Model configuration	Architecture hyperparameters and HuggingFace compatibility
2	Feed-forward network	Two-layer MLP with GELU activation
3	Causal masking	Preventing attention to future tokens
4	Multi-head attention	Parallel attention across 12 heads
5	Layer normalization	Pre-norm pattern for stable activations
6	Transformer block	Residual connections and component wiring
7	Stacking transformer blocks	Embeddings and the 12-layer model body
8	Language model head	Projecting hidden states to vocabulary logits
9	Encode and decode tokens	BPE tokenization with HuggingFace
10	Text generation	Compiled sampling heads and Gumbel-max sampling
11	Load weights and run model	Lazy init, weight transposition, and model compilation
12	Streaming chat	Stop sequences, BPE boundary handling, and live rendering

Project structure

max-llm-book/
├── book/                  # mdBook tutorial documentation
│   └── src/
│       ├── introduction.md
│       ├── step_01.md ... step_12.md
│       └── SUMMARY.md
├── gpt2.py               # Complete GPT-2 implementation
├── tests/                # Tests for gpt2.py
├── pixi.toml             # Project dependencies and tasks
└── README.md             # This file

Learning resources

MAX Documentation: docs.modular.com
HuggingFace GPT-2: huggingface.co/gpt2
Attention Is All You Need: arxiv.org/abs/1706.03762
Language Models are Unsupervised Multitask Learners (GPT-2 paper): openai.com

Contributing

Found an issue or want to improve the tutorial? Contributions welcome:

File issues for bugs or unclear explanations
Suggest improvements to code examples or visualizations
Open a pull request with fixes or additions

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
.github/workflows		.github/workflows
book		book
scripts		scripts
tests		tests
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
SECURITY.md		SECURITY.md
gpt2.py		gpt2.py
pixi.lock		pixi.lock
pixi.toml		pixi.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Build an LLM from scratch with MAX

What you'll learn

Quick start

Prerequisites

Installation

Run the model

Read the book

What the book covers

Project structure

Learning resources

Contributing

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Build an LLM from scratch with MAX

What you'll learn

Quick start

Prerequisites

Installation

Run the model

Read the book

What the book covers

Project structure

Learning resources

Contributing

About

Resources

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages