Skip to content

llama.cpp b6838 with CUDA

Choose a tag to compare

@github-actions github-actions released this 26 Oct 02:55

llama.cpp b6838 with CUDA Support

Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.

Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6838
Commit: 226f295f4dd92ad714533adc5497afed5fa88bb8

CUDA Versions

  • CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
  • CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
  • CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0

Architecture Reference

  • 6.1: Titan XP, Tesla P40, GTX 10xx
  • 7.0: Tesla V100
  • 7.5: Tesla T4, RTX 20xx series, Quadro RTX
  • 8.0: A100
  • 8.6: RTX 3000 series
  • 8.9: RTX 4000 series, L4, L40
  • 9.0: H100, H200
  • 10.0: B200
  • 12.0: RTX Pro series, RTX 50xx

Usage

Download the appropriate tarball for your CUDA version and extract:

tar -xzf llama.cpp-b6838-cuda-12.8.tar.gz
./llama-cli --help