Releases: ai-dock/llama.cpp-cuda
llama.cpp b6992 with CUDA
llama.cpp b6992 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6992
Commit: aa3b7a90b407c556778a7e13a4b0d28cf964fd1c
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b6992-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b6980 with CUDA
llama.cpp b6980 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6980
Commit: 299f5d782c8ffd7195a1ed6a6d5561f759beb07e
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b6980-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b6970 with CUDA
llama.cpp b6970 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6970
Commit: 7f09a680af6e0ef612de81018e1d19c19b8651e8
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b6970-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b6962 with CUDA
llama.cpp b6962 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6962
Commit: 230d1169e5bfe04a013b2e20f4662ee56c2454b0
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b6962-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b6949 with CUDA
llama.cpp b6949 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6949
Commit: a5c07dcd7b49916c7c770f2da9583e6b82717678
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b6949-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b6940 with CUDA
llama.cpp b6940 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6940
Commit: c5023daf607c578d6344c628eb7da18ac3d92d32
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b6940-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b6929 with CUDA
llama.cpp b6929 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6929
Commit: a2054e3a8ff0da3978a4acc18c349ff58554d336
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b6929-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b6920 with CUDA
llama.cpp b6920 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6920
Commit: d38d9f0877a5872daa3c5f06fb9a86376bf15d50
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b6920-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b6907 with CUDA
llama.cpp b6907 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6907
Commit: bea04522ff1a0d8559ccfd353aa018dcfbb608cc
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b6907-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b6891 with CUDA
llama.cpp b6891 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6891
Commit: 16724b5b6836a2d4b8936a5824d2ff27c52b4517
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b6891-cuda-12.8.tar.gz
./llama-cli --help