Skip to content

geerlingguy/ollama-benchmark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ollama-benchmark

.github/workflows/shellcheck.yaml

This bash script benchmarks LLMs using Ollama, and also aggregates test data using other LLM tools such as llama.cpp.

For a quick installation, try:

curl -fsSL https://ollama.com/install.sh | sh

If you're not running Linux, download Ollama from the official site.

Verify you can run ollama with a given model:

ollama run llama3.2:3b

Then run this benchmark script:

./obench.sh

Uninstall Ollama following the official uninstall instructions.

CLI Options

Usage: ./obench.sh [OPTIONS]
Options:
 -h, --help      Display this help message
 -d, --default   Run a benchmark using some default small models
 -m, --model     Specify a model to use
 -c, --count     Number of times to run the benchmark
 --ollama-bin    Point to ollama executable or command (e.g if using Docker)
 --markdown      Format output as markdown

Findings

DeepSeek R1 14b

System CPU/GPU Eval Rate Power (Peak)
Pi 5 - 16GB CPU 1.20 Tokens/s 13.0 W
Pi 5 - 16GB (AMD Pro W77001) GPU 19.90 Tokens/s 164 W
GMKtek G3 Plus (Intel N150) - 16GB CPU 2.13 Tokens/s 30.3 W
Radxa Orion O6 - 16GB CPU 4.33 Tokens/s 34.7 W
Radxa Orion O6 - 16GB (Nvidia RTX 3080 Ti) GPU 64.58 Tokens/s 465 W
M1 Ultra (48 GPU Core) 64GB GPU 35.89 Tokens/s N/A
Framework Mainboard (128GB) CPU 11.37 Tokens/s 140W

DeepSeek R1 671b

System CPU/GPU Eval Rate Power (Peak)
AmpereOne A192-32X - 512GB CPU 4.18 Tokens/s 477 W

Llama 3.2:3b

System CPU/GPU Eval Rate Power (Peak)
Pi 400 - 4GB CPU 1.60 Tokens/s 6 W
Pi 5 - 8GB CPU 4.61 Tokens/s 13.9 W
Pi 5 - 16GB CPU 4.88 Tokens/s 11.9 W
GMKtec G3 Plus (Intel N150) - 16GB CPU 9.06 Tokens/s 26.4 W
Pi 5 - 8GB (AMD RX 6500 XT1) GPU 39.82 Tokens/s 88 W
Pi 5 - 8GB (AMD RX 6700 XT1) 12GB GPU 49.01 Tokens/s 94 W
Pi 5 - 8GB (AMD RX 76001) GPU 48.47 Tokens/s 156 W
Pi 5 - 8GB (AMD Pro W77001) GPU 56.14 Tokens/s 145 W
Pi 5 - 16GB (AMD RX 7900 XT1) GPU 108.58 Tokens/s 315 W
M4 Mac mini (10 core - 32GB) GPU 41.31 Tokens/s 30.1 W
M1 Max Mac Studio (10 core - 64GB) GPU 59.38 Tokens/s N/A
M1 Ultra (48 GPU Core) 64GB GPU 108.67 Tokens/s N/A
HiFive Premier P550 (4-core RISC-V) CPU 0.24 Tokens/s 13.5 W
Ryzen 9 7900X (Nvidia 4090) GPU 237.05 Tokens/s N/A
Intel 13900K (Nvidia 5090) GPU 271.40 Tokens/s N/A
Intel 13900K (Nvidia 4090) GPU 216.48 Tokens/s N/A
Ryzen 9 9950X (AMD 7900 XT) GPU 131.2 Tokens/s N/A
Ryzen 9 7950X (Nvidia 4080) GPU 204.45 Tokens/s N/A
Ryzen 9 7950X (Nvidia 4070 Ti Super) GPU 198.95 Tokens/s N/A
Ryzen 9 5950X (Nvidia 4070) GPU 160.72 Tokens/s N/A
System76 Thelio Astra (Nvidia A400) GPU 35.51 Tokens/s 167 W
System76 Thelio Astra (Nvidia A4000) GPU 90.92 Tokens/s 244 W
System76 Thelio Astra (AMD Pro W77001) GPU 89.31 Tokens/s 261 W
AmpereOne A192-32X (512GB) CPU 23.52 Tokens/s N/A
Framework Mainboard (128GB) GPU 88.14 Tokens/s 133W

Llama 3.1:70b

System CPU/GPU Eval Rate Power (Peak)
M1 Max Mac Studio (10 core - 64GB) GPU 7.25 Tokens/s N/A
Ryzen 9 7900X (Nvidia 4090) GPU/CPU 3.10 Tokens/s N/A
AmpereOne A192-32X (512GB) CPU 3.86 Tokens/s N/A
Framework Mainboard (128GB) GPU 4.47 Tokens/s 139W
Raspberry Pi CM5 Cluster (10x 16GB) CPU 0.85 Tokens/s 70W

1 These GPUs were tested using llama.cpp with Vulkan support.

Llama 3.1:405b

System CPU/GPU Eval Rate Power (Peak)
AmpereOne A192-32X (512GB) CPU 0.90 Tokens/s N/A
Framework Mainboard Cluster (512GB) GPU 0.71 Tokens/s N/A

Further Reading

This script is just a quick way of comparing one aspect of generative AI performance. There are many other aspects that are as important (or more important) this script does not cover.

See All about Timing: A quick look at metrics for LLM serving for a good overview of other metrics you may want to compare when running Ollama.

Author

This benchmark is based on the upstream project tabletuser-blogspot/ollama-benchmark, and is maintained by Jeff Geerling.

About

Simple ollama benchmarking tool.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%