llama-cpp-python
is a state-of-the-art library for efficient inference with Meta's LLaMA large language model, optimized for various hardware acceleration technologies. This guide provides detailed instructions for setting up llama-cpp-python
on systems with Python 3.8 or later, highlighting support for OpenBLAS, Vulkan, CUDA, ROCm, and Metal.
Before proceeding, ensure your system meets the following requirements:
- Python version 3.8 or later installed.
pip
is updated to the latest version to avoid any compatibility issues.
Install llama-cpp-python
directly from PyPI to ensure you get a version built with optimizations specific to your system:
pip install llama-cpp-python
For upgrading or rebuilding with different options:
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
Follow the corresponding section below based on your preferred or available hardware acceleration backend.
For enhanced processing with OpenBLAS:
export CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
pip install llama-cpp-python
This configures llama-cpp-python
to use OpenBLAS, offering faster computation where supported.
Vulkan support, useful for systems with unsupported graphics cards:
export CMAKE_ARGS="-DLLAMA_VULKAN=on"
pip install llama-cpp-python
For NVIDIA GPUs, enabling CUDA acceleration:
export CMAKE_ARGS="-DLLAMA_CUBLAS=on"
pip install llama-cpp-python
Apple Silicon users can leverage Metal for acceleration:
export CMAKE_ARGS="-DLLAMA_METAL=on"
pip install llama-cpp-python
Ensure your Python installation supports arm64
architecture. For Miniforge on Apple Silicon:
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh
For AMD GPU acceleration with ROCm:
export CMAKE_ARGS="-DLLAMA_HIPBLAS=on"
pip install llama-cpp-python
Specify additional ROCm configuration as needed:
export LLAMA_HIPBLAS=on
export HIP_VISIBLE_DEVICES=0
export HSA_OVERRIDE_GFX_VERSION=10.3.0
export LLAMA_CUDA_DMMV_X=64
export LLAMA_CUDA_MMV_Y=2
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
If you encounter any issues, consider the following solutions:
- Verify that your Python and
pip
versions are up to date, asllama-cpp-python
supports Python 3.8 or higher. - Ensure your GPU drivers and any required installations (CUDA/ROCm) are current and correctly set up.
- Check that environment variables are properly configured in your system.
For more detailed troubleshooting, visit:
For more information on the specific technologies and configurations mentioned here, please refer to the following resources:
- OpenBLAS Documentation
- Vulkan Guide
- CUDA Toolkit Documentation
- ROCm Installation
- Metal for Developers
Your participation and feedback are highly valued in our community. If you have questions, feedback, or require assistance, you're encouraged to join the community forum or file an issue related to llama.cpp
and llama-cpp-python
development on GitHub. This collaborative effort not only helps you but also enhances the tool for others.
For llama.cpp
-specific inquiries or contributions, the llama.cpp
GitHub repository is the go-to resource. Engaging with the community through forums, discussions, or by contributing directly can offer valuable insights and support for your projects.