llama.cpp b6838 with CUDA
llama.cpp b6838 with CUDA Support
Pre-built binaries of llama.cpp with CUDA support for multiple CUDA versions.
Source: https://github.com/ggml-org/llama.cpp/releases/tag/b6838
Commit: 226f295f4dd92ad714533adc5497afed5fa88bb8
CUDA Versions
- CUDA 12.4 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.6 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0
- CUDA 12.8 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 12.9 - Architectures: 6.1, 7.0, 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
- CUDA 13.0 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
Architecture Reference
- 6.1: Titan XP, Tesla P40, GTX 10xx
- 7.0: Tesla V100
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
- 8.0: A100
- 8.6: RTX 3000 series
- 8.9: RTX 4000 series, L4, L40
- 9.0: H100, H200
- 10.0: B200
- 12.0: RTX Pro series, RTX 50xx
Usage
Download the appropriate tarball for your CUDA version and extract:
tar -xzf llama.cpp-b6838-cuda-12.8.tar.gz
./llama-cli --help