Skip to content

garg-aayush/building-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building From Scratch

A hands-on repo where I build language models and related concepts from first principles, step by step. The main purpose is to learn and understand the inner workings of different components and improvements in transformers.

Repo structure

building-from-scratch/
├── basic-gpt/   # Character-level GPT built incrementally; notebooks, scripts
└── gpt-2/       # GPT‑2 (124M) reproduction + improvements; training runs, notes

Basic-GPT

It contains a character-level GPT built incrementally, following Karpathy’s "Let’s build GPT: from scratch". The final scaled model reaches ~1.48 validation loss.

Basic-GPT training/validation loss curves

Read more →

GPT-2

I reimplemented GPT‑2 (124M) code from scratch and then further added improvements such as RoPE, global data shuffling, and tuned the learning rate schedule. In my best run (gpt2-rope), I achieved a validation loss 2.987 and HellaSwag accuracy 0.320, surpassing the original GPT‑2 baseline quite significantly.

GPT-2 experiments comparison (validation loss and accuracy)

Experiment Min Validation Loss Max HellaSwag Acc Description
gpt2-baseline 3.065753 0.303724 Original GPT-2 architecture
gpt2-periodicity-fix 3.063873 0.305517 Fixed data loading periodicity
gpt2-lr-inc 3.021046 0.315475 Increased learning rate by 3x and reduced warmup steps
gpt2-global-datafix 3.004503 0.316869 Used global shuffling with better indexing
gpt2-rope 2.987392 0.320155 Replaced learned embeddings with RoPE

Read more →

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published