q

Homebrew small-scale LLM based on GPT-2

I'd like to gain practical experience with transformers, particularly by understanding their architecture and real-world applications, with a focus on small-scale LLMs. To achieve this, I decided to create a tiny LLM. First, I plan to study excellent articles and papers to understand the basic concepts and architecture. Next, I will build and improve my own GPT model. My goal is to integrate it into web applications, games, and iOS apps that interest me.

Currently, I am studying by building a LLM based on OpenAI's GPT-2 model. I used an extremely simple numpy-based model as a baseline and am experimenting with an implementation using mlx.

Install

$ poetry install

Download model parameters

You have to download an OpenAI GPT-2 model parameters before executing q:

$ poetry install --extras download
$ poetry run download --model-size 124M

Available models:

124M
355M
774M
1558M

Run

$ poetry run q "Alan Turing theorized that computers would one day become"
Generating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 42.19it/s]
Generated 41.35 tokens/sec

Alan Turing theorized that computers would one day become the most powerful machines on the planet.

The computer is a machine that can perform complex calculations, and it can perform these calculations in a way that is very similar to the human brain.

Stream output

You can enable stream output by setting --stream flag:

$ poetry run q --stream "Alan Turing theorized that computers would one day become"
Alan Turing theorized that computers would one day become the most powerful machines on the planet.

The computer is a machine that can perform complex calculations, and it can perform these calculations in a way that is very similar to the human brain.

Generated 37.19 tokens/sec

Evaluation

Model	Hellaswag	MMLU
Q (124M)	28.92%	22.92%
Q (335M)	33.31%	22.90%
GPT-2 (124M)	28.92%	22.92%
Qwen2.5-0.5B	40.59%	47.14%

Hellaswag:

Measure: Accuracy
Shots: 0-shot

MMLU

Measure: Accuracy
Shots: 0-shot

How to evaluate

You can run lm-evaluation-harness.

poetry run python -m q.eval --model q --model_args model_size=355M --tasks hellaswag

and we have our evaluation script.

Benchmark

TPS (Average)

`max_length`	64	128	256
Q (124M)	80.90	80.79	79.05
GPT-2 (124M)	53.96	51.56	54.76
Qwen2.5-0.5B	21.80	22.33	22.24

Peak Memory (Average, MB)

`max_length`	64	128	256
Q (124M)	777.11	779.03	779.89
GPT-2 (124M)	781.96	974.82	1358.00
Qwen2.5-0.5B	1257.32	1292.65	1284.94

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
eval		eval
models/124M		models/124M
notebooks		notebooks
q		q
scripts		scripts
test		test
.flake8		.flake8
.gitignore		.gitignore
.windsurfrules		.windsurfrules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

q

Install

Download model parameters

Run

Stream output

Evaluation

How to evaluate

Benchmark

TPS (Average)

Peak Memory (Average, MB)

References

About

Releases

Packages

Languages

License

ishikawa/q

Folders and files

Latest commit

History

Repository files navigation

q

Install

Download model parameters

Run

Stream output

Evaluation

How to evaluate

Benchmark

TPS (Average)

Peak Memory (Average, MB)

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages