Skip to content

Releases: ai-dock/llama.cpp-cuda

llama.cpp b7475 with CUDA

19 Dec 03:02

Choose a tag to compare

llama.cpp b7475 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7475
Commit: 8ea958d4d9d5bd43f9f244950bf057b5227ce8d9

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7475-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7446 with CUDA

18 Dec 02:56

Choose a tag to compare

llama.cpp b7446 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7446
Commit: 5c0d18881e0e9794c96b2602736b758bac9d9388

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7446-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7438 with CUDA

17 Dec 02:56

Choose a tag to compare

llama.cpp b7438 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7438
Commit: ef83fb8601229ff650d952985be47e82d644bfaa

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7438-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7414 with CUDA

16 Dec 03:03

Choose a tag to compare

llama.cpp b7414 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7414
Commit: 9d52f17ae33e8df958e20f3f1b13bfec53ab5a1d

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7414-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7404 with CUDA

15 Dec 03:06

Choose a tag to compare

llama.cpp b7404 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7404
Commit: 52392291b2f3a24b5d3fef4fc0b56f10db358dc1

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7404-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7388 with CUDA

14 Dec 03:06

Choose a tag to compare

llama.cpp b7388 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7388
Commit: 4ed2bae50d64dcff7f99cb2b28f737fda314abf5

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7388-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7371 with CUDA

13 Dec 02:54

Choose a tag to compare

llama.cpp b7371 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7371
Commit: 7bed317f5351eba037c2e0aa3dce617e277be1c4

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7371-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7360 with CUDA

12 Dec 03:01

Choose a tag to compare

llama.cpp b7360 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7360
Commit: 53ecd4fdb923dcb53d311ed42798cae7198aa742

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7360-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7349 with CUDA

11 Dec 03:04

Choose a tag to compare

llama.cpp b7349 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7349
Commit: 4df6e859e92dac52536f735785bb9e0a3bc63e2b

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7349-cuda-12.8.tar.gz
./llama-cli --help

llama.cpp b7342 with CUDA

10 Dec 03:01

Choose a tag to compare

llama.cpp b7342 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b7342
Commit: 2fbe3b7bb7814621477d658a81f0d0cca08761cf

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b7342-cuda-12.8.tar.gz
./llama-cli --help