This bash script benchmarks LLMs using Ollama, and also aggregates test data using other LLM tools such as llama.cpp.
For a quick installation, try:
curl -fsSL https://ollama.com/install.sh | sh
If you're not running Linux, download Ollama from the official site.
Verify you can run ollama
with a given model:
ollama run llama3.2:3b
Then run this benchmark script:
./obench.sh
Uninstall Ollama following the official uninstall instructions.
Usage: ./obench.sh [OPTIONS]
Options:
-h, --help Display this help message
-d, --default Run a benchmark using some default small models
-m, --model Specify a model to use
-c, --count Number of times to run the benchmark
--ollama-bin Point to ollama executable or command (e.g if using Docker)
--markdown Format output as markdown
System | CPU/GPU | Eval Rate | Power (Peak) |
---|---|---|---|
Pi 5 - 16GB | CPU | 1.20 Tokens/s | 13.0 W |
Pi 5 - 16GB (AMD Pro W77001) | GPU | 19.90 Tokens/s | 164 W |
GMKtek G3 Plus (Intel N150) - 16GB | CPU | 2.13 Tokens/s | 30.3 W |
Radxa Orion O6 - 16GB | CPU | 4.33 Tokens/s | 34.7 W |
Radxa Orion O6 - 16GB (Nvidia RTX 3080 Ti) | GPU | 64.58 Tokens/s | 465 W |
M1 Ultra (48 GPU Core) 64GB | GPU | 35.89 Tokens/s | N/A |
Framework Mainboard (128GB) | CPU | 11.37 Tokens/s | 140W |
System | CPU/GPU | Eval Rate | Power (Peak) |
---|---|---|---|
AmpereOne A192-32X - 512GB | CPU | 4.18 Tokens/s | 477 W |
System | CPU/GPU | Eval Rate | Power (Peak) |
---|---|---|---|
Pi 400 - 4GB | CPU | 1.60 Tokens/s | 6 W |
Pi 5 - 8GB | CPU | 4.61 Tokens/s | 13.9 W |
Pi 5 - 16GB | CPU | 4.88 Tokens/s | 11.9 W |
GMKtec G3 Plus (Intel N150) - 16GB | CPU | 9.06 Tokens/s | 26.4 W |
Pi 5 - 8GB (AMD RX 6500 XT1) | GPU | 39.82 Tokens/s | 88 W |
Pi 5 - 8GB (AMD RX 6700 XT1) 12GB | GPU | 49.01 Tokens/s | 94 W |
Pi 5 - 8GB (AMD RX 76001) | GPU | 48.47 Tokens/s | 156 W |
Pi 5 - 8GB (AMD Pro W77001) | GPU | 56.14 Tokens/s | 145 W |
Pi 5 - 16GB (AMD RX 7900 XT1) | GPU | 108.58 Tokens/s | 315 W |
M4 Mac mini (10 core - 32GB) | GPU | 41.31 Tokens/s | 30.1 W |
M1 Max Mac Studio (10 core - 64GB) | GPU | 59.38 Tokens/s | N/A |
M1 Ultra (48 GPU Core) 64GB | GPU | 108.67 Tokens/s | N/A |
HiFive Premier P550 (4-core RISC-V) | CPU | 0.24 Tokens/s | 13.5 W |
Ryzen 9 7900X (Nvidia 4090) | GPU | 237.05 Tokens/s | N/A |
Intel 13900K (Nvidia 5090) | GPU | 271.40 Tokens/s | N/A |
Intel 13900K (Nvidia 4090) | GPU | 216.48 Tokens/s | N/A |
Ryzen 9 9950X (AMD 7900 XT) | GPU | 131.2 Tokens/s | N/A |
Ryzen 9 7950X (Nvidia 4080) | GPU | 204.45 Tokens/s | N/A |
Ryzen 9 7950X (Nvidia 4070 Ti Super) | GPU | 198.95 Tokens/s | N/A |
Ryzen 9 5950X (Nvidia 4070) | GPU | 160.72 Tokens/s | N/A |
System76 Thelio Astra (Nvidia A400) | GPU | 35.51 Tokens/s | 167 W |
System76 Thelio Astra (Nvidia A4000) | GPU | 90.92 Tokens/s | 244 W |
System76 Thelio Astra (AMD Pro W77001) | GPU | 89.31 Tokens/s | 261 W |
AmpereOne A192-32X (512GB) | CPU | 23.52 Tokens/s | N/A |
Framework Mainboard (128GB) | GPU | 88.14 Tokens/s | 133W |
System | CPU/GPU | Eval Rate | Power (Peak) |
---|---|---|---|
M1 Max Mac Studio (10 core - 64GB) | GPU | 7.25 Tokens/s | N/A |
Ryzen 9 7900X (Nvidia 4090) | GPU/CPU | 3.10 Tokens/s | N/A |
AmpereOne A192-32X (512GB) | CPU | 3.86 Tokens/s | N/A |
Framework Mainboard (128GB) | GPU | 4.47 Tokens/s | 139W |
Raspberry Pi CM5 Cluster (10x 16GB) | CPU | 0.85 Tokens/s | 70W |
1 These GPUs were tested using llama.cpp
with Vulkan support.
System | CPU/GPU | Eval Rate | Power (Peak) |
---|---|---|---|
AmpereOne A192-32X (512GB) | CPU | 0.90 Tokens/s | N/A |
Framework Mainboard Cluster (512GB) | GPU | 0.71 Tokens/s | N/A |
This script is just a quick way of comparing one aspect of generative AI performance. There are many other aspects that are as important (or more important) this script does not cover.
See All about Timing: A quick look at metrics for LLM serving for a good overview of other metrics you may want to compare when running Ollama.
This benchmark is based on the upstream project tabletuser-blogspot/ollama-benchmark, and is maintained by Jeff Geerling.