Releases: ai-dock/llama.cpp-cuda
llama.cpp b7097 with CUDA
llama.cpp b7097 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7097
Commit: 1920345c3bcec451421bb6abc4981678cc721154
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b7097-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b7087 with CUDA
llama.cpp b7087 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7087
Commit: cb623de3fc61011e5062522b4d05721a22f2e916
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b7087-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b7083 with CUDA
llama.cpp b7083 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7083
Commit: 2376b7758c58b0ede05de382bf72bb538f11ef9a
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b7083-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b7075 with CUDA
llama.cpp b7075 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7075
Commit: 72bd7321a7d7465d371eb2ae46cd5518842c8f44
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b7075-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b7062 with CUDA
llama.cpp b7062 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7062
Commit: 9b17d74ab7d31cb7d15ee7eec1616c3d825a84c0
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b7062-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b7054 with CUDA
llama.cpp b7054 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7054
Commit: becc4816dd6e601d2e0beb7b9c7e6767c8688b12
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b7054-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b7042 with CUDA
llama.cpp b7042 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7042
Commit: ffb6f3d921bbc64d559164e23671a710a4dd9de5
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b7042-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b7027 with CUDA
llama.cpp b7027 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7027
Commit: 7d019cff744b73084b15ca81ba9916f3efab1223
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b7027-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b7017 with CUDA
llama.cpp b7017 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7017
Commit: 7bef684118cc44f9ab8b82df102d68db94a6d9f4
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b7017-cuda-12.8.tar.gz
./llama-cli --helpllama.cpp b7003 with CUDA
llama.cpp b7003 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7003
Commit: b8595b16e69e3029e06be3b8f6635f9812b2bc3f
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b7003-cuda-12.8.tar.gz
./llama-cli --help