Skip to content

Releases: ai-dock/llama.cpp-cuda

llama.cpp b6992 with CUDA

09 Nov 02:54

Choose a tag to compare

llama.cpp b6992 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6992
Commit: aa3b7a90b407c556778a7e13a4b0d28cf964fd1c

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b6992-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b6980 with CUDA

08 Nov 02:48

Choose a tag to compare

llama.cpp b6980 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6980
Commit: 299f5d782c8ffd7195a1ed6a6d5561f759beb07e

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b6980-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b6970 with CUDA

07 Nov 02:48

Choose a tag to compare

llama.cpp b6970 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6970
Commit: 7f09a680af6e0ef612de81018e1d19c19b8651e8

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b6970-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b6962 with CUDA

06 Nov 02:48

Choose a tag to compare

llama.cpp b6962 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6962
Commit: 230d1169e5bfe04a013b2e20f4662ee56c2454b0

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b6962-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b6949 with CUDA

05 Nov 02:52

Choose a tag to compare

llama.cpp b6949 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6949
Commit: a5c07dcd7b49916c7c770f2da9583e6b82717678

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b6949-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b6940 with CUDA

04 Nov 02:51

Choose a tag to compare

llama.cpp b6940 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6940
Commit: c5023daf607c578d6344c628eb7da18ac3d92d32

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b6940-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b6929 with CUDA

03 Nov 03:01

Choose a tag to compare

llama.cpp b6929 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6929
Commit: a2054e3a8ff0da3978a4acc18c349ff58554d336

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b6929-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b6920 with CUDA

02 Nov 02:52

Choose a tag to compare

llama.cpp b6920 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6920
Commit: d38d9f0877a5872daa3c5f06fb9a86376bf15d50

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b6920-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b6907 with CUDA

01 Nov 02:53

Choose a tag to compare

llama.cpp b6907 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6907
Commit: bea04522ff1a0d8559ccfd353aa018dcfbb608cc

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b6907-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b6891 with CUDA

31 Oct 02:44

Choose a tag to compare

llama.cpp b6891 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6891
Commit: 16724b5b6836a2d4b8936a5824d2ff27c52b4517

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b6891-cuda-12.8.tar.gz
./llama-cli --help