Skip to content

Releases: ai-dock/llama.cpp-cuda

llama.cpp b7342 with CUDA

10 Dec 03:01

Choose a tag to compare

llama.cpp b7342 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7342
Commit: 2fbe3b7bb7814621477d658a81f0d0cca08761cf

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7342-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7327 with CUDA

09 Dec 02:59

Choose a tag to compare

llama.cpp b7327 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7327
Commit: c8554b66e0ed397f7457ed5f3ce3b466dd508d5c

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7327-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7315 with CUDA

08 Dec 02:59

Choose a tag to compare

llama.cpp b7315 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7315
Commit: 4d3726278bfd3f42ad9a5b5db2ae056a8eef2ee2

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7315-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7310 with CUDA

07 Dec 03:09

Choose a tag to compare

llama.cpp b7310 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7310
Commit: db97837385edfbc772230debbd49e5efae843a71

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7310-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7285 with CUDA

06 Dec 02:52

Choose a tag to compare

llama.cpp b7285 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7285
Commit: 6016d0bd414d0512a78aebccded6af50bc6d71aa

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7285-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7276 with CUDA

05 Dec 03:00

Choose a tag to compare

llama.cpp b7276 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7276
Commit: 96fe9badfc5235ff0a049aca647bff8c448055aa

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7276-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7261 with CUDA

04 Dec 02:58

Choose a tag to compare

llama.cpp b7261 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7261
Commit: dea9ba27cbcbdeaf516a252b77860e5eb9579892

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7261-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7237 with CUDA

03 Dec 02:50

Choose a tag to compare

llama.cpp b7237 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7237
Commit: e148380c7cb92f10dabff939ebfa567dcdadbde0

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7237-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7222 with CUDA

02 Dec 02:55

Choose a tag to compare

llama.cpp b7222 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7222
Commit: 746f9ee88941c2f259268c484fe8278375387081

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7222-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7211 with CUDA

01 Dec 03:09

Choose a tag to compare

llama.cpp b7211 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7211
Commit: 0a4aeb927db9ac28a845ceeaf1bd5e78442f0acc

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7211-cuda-12.8.tar.gz
./llama-cli --help