Homebrew small-scale LLM based on GPT-2
I'd like to gain practical experience with transformers, particularly by understanding their architecture and real-world applications, with a focus on small-scale LLMs. To achieve this, I decided to create a tiny LLM. First, I plan to study excellent articles and papers to understand the basic concepts and architecture. Next, I will build and improve my own GPT model. My goal is to integrate it into web applications, games, and iOS apps that interest me.
Currently, I am studying by building a LLM based on OpenAI's GPT-2 model. I used an extremely simple numpy-based model as a baseline and am experimenting with an implementation using mlx.
$ poetry install
You have to download an OpenAI GPT-2 model parameters before executing q
:
$ poetry install --extras download
$ poetry run download --model-size 124M
Available models:
124M
355M
774M
1558M
$ poetry run q "Alan Turing theorized that computers would one day become"
Generating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 42.19it/s]
Generated 41.35 tokens/sec
Alan Turing theorized that computers would one day become the most powerful machines on the planet.
The computer is a machine that can perform complex calculations, and it can perform these calculations in a way that is very similar to the human brain.
You can enable stream output by setting --stream
flag:
$ poetry run q --stream "Alan Turing theorized that computers would one day become"
Alan Turing theorized that computers would one day become the most powerful machines on the planet.
The computer is a machine that can perform complex calculations, and it can perform these calculations in a way that is very similar to the human brain.
Generated 37.19 tokens/sec
Model | Hellaswag | MMLU |
---|---|---|
Q (124M) | 28.92% | 22.92% |
Q (335M) | 33.31% | 22.90% |
GPT-2 (124M) | 28.92% | 22.92% |
Qwen2.5-0.5B | 40.59% | 47.14% |
- Measure: Accuracy
- Shots: 0-shot
- Measure: Accuracy
- Shots: 0-shot
You can run lm-evaluation-harness.
poetry run python -m q.eval --model q --model_args model_size=355M --tasks hellaswag
and we have our evaluation script.
max_length |
64 | 128 | 256 |
---|---|---|---|
Q (124M) | 80.90 | 80.79 | 79.05 |
GPT-2 (124M) | 53.96 | 51.56 | 54.76 |
Qwen2.5-0.5B | 21.80 | 22.33 | 22.24 |
max_length |
64 | 128 | 256 |
---|---|---|---|
Q (124M) | 777.11 | 779.03 | 779.89 |
GPT-2 (124M) | 781.96 | 974.82 | 1358.00 |
Qwen2.5-0.5B | 1257.32 | 1292.65 | 1284.94 |