Skip to content

Reconstructs a character-level language model following the paper ”Attention is All You Need” and OpenAI’s GPT-2 / GPT-3.

Notifications You must be signed in to change notification settings

man-o-to/GPT-DEV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT-DEV

Reconstructs a character-level GPT/language model following the paper ”Attention is All You Need” and OpenAI’s GPT-2 / GPT-3.

The Transformer - model architechture

Notes

Bigram.py defines and trains a GPT with a context size of up to 256 characters, 384 feature channels, and it is a 6-layer Transformer with 6 heads in each layer. On one A100 GPU this training run takes about 3 minutes and the best validation loss is 1.4697.

However, I only have a Mac :(

So, since I'm running on CPU instead of GPU we must set both --device=cpu and also turn off PyTorch 2.0 compile with --compile=False. Then when we evaluate we get a bit more noisy but faster estimate (--eval_iters=20, down from 200), our context size is only 64 characters instead of 256, and the batch size only 12 examples per iteration, not 64. We'll also use a much smaller Transformer (4 layers, 4 heads, 128 embedding size), and decrease the number of iterations to 2000 (and correspondingly usually decay the learning rate to around max_iters with --lr_decay_iters). Because our network is so small we also ease down on regularization (--dropout=0.0). This still runs in about ~3 minutes, but gets us a loss of only 1.88 and therefore also worse samples, but it's still generates samples like this:

GLEORKEN VINGHARD III:
Whell's the couse, the came light gacks,
And the for mought you in Aut fries the not high shee
bot thou the sought bechive in that to doth groan you,
No relving thee post mose the wear

About

Reconstructs a character-level language model following the paper ”Attention is All You Need” and OpenAI’s GPT-2 / GPT-3.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published