Skip to content

ishikawa/q

Repository files navigation

q

workflow

Homebrew small-scale LLM based on GPT-2

I'd like to gain practical experience with transformers, particularly by understanding their architecture and real-world applications, with a focus on small-scale LLMs. To achieve this, I decided to create a tiny LLM. First, I plan to study excellent articles and papers to understand the basic concepts and architecture. Next, I will build and improve my own GPT model. My goal is to integrate it into web applications, games, and iOS apps that interest me.

Currently, I am studying by building a LLM based on OpenAI's GPT-2 model. I used an extremely simple numpy-based model as a baseline and am experimenting with an implementation using mlx.

Install

$ poetry install

Download model parameters

You have to download an OpenAI GPT-2 model parameters before executing q:

$ poetry install --extras download
$ poetry run download --model-size 124M

Available models:

  • 124M
  • 355M
  • 774M
  • 1558M

Run

$ poetry run q "Alan Turing theorized that computers would one day become"
Generating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 42.19it/s]
Generated 41.35 tokens/sec

Alan Turing theorized that computers would one day become the most powerful machines on the planet.

The computer is a machine that can perform complex calculations, and it can perform these calculations in a way that is very similar to the human brain.

Stream output

You can enable stream output by setting --stream flag:

$ poetry run q --stream "Alan Turing theorized that computers would one day become"
Alan Turing theorized that computers would one day become the most powerful machines on the planet.

The computer is a machine that can perform complex calculations, and it can perform these calculations in a way that is very similar to the human brain.

Generated 37.19 tokens/sec

Evaluation

Model Hellaswag MMLU
Q (124M) 28.92% 22.92%
Q (335M) 33.31% 22.90%
GPT-2 (124M) 28.92% 22.92%
Qwen2.5-0.5B 40.59% 47.14%

Hellaswag:

  • Measure: Accuracy
  • Shots: 0-shot

MMLU

  • Measure: Accuracy
  • Shots: 0-shot

How to evaluate

You can run lm-evaluation-harness.

poetry run python -m q.eval --model q --model_args model_size=355M --tasks hellaswag

and we have our evaluation script.

Benchmark

TPS (Average)

max_length 64 128 256
Q (124M) 80.90 80.79 79.05
GPT-2 (124M) 53.96 51.56 54.76
Qwen2.5-0.5B 21.80 22.33 22.24

Peak Memory (Average, MB)

max_length 64 128 256
Q (124M) 777.11 779.03 779.89
GPT-2 (124M) 781.96 974.82 1358.00
Qwen2.5-0.5B 1257.32 1292.65 1284.94

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published