Open
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
- GGML v3 models should load just fine. As per this post, this type of error should have been resolved.
- In particular, models like
LLaMa-13B-GGML/llama-13b.ggmlv3.q5_1.bin
orllama-13b.ggmlv3.q6_K.bin
should load.
Current Behavior
- However, instead, I'm getting this error
ggml_metal_add_buffer: buffer 'data' size 9763717120 is larger than buffer maximum of 8589934592
llama_init_from_file: failed to add buffer
Environment and Context
llama-cpp-python 0.1.67
M2 Pro
16 GB
Macosx
Darwin UAVALOS-M-NR30 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:23 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6020 arm64
Python 3.10.10
GNU Make 3.81
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Failure Information (for bugs)
See above
Steps to Reproduce
!pip uninstall llama-cpp-python -y
!CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir
!pip install 'llama-cpp-python[server]'
from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
model_path = "/Users/uavalos/Documents/LLaMa-13B-GGML/llama-13b.ggmlv3.q5_1.bin";
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
n_gpu_layers = 1
n_batch = 512
llm = LlamaCpp(
model_path=model_path,
n_gpu_layers=n_gpu_layers, n_batch=n_batch,
callback_manager=callback_manager,
verbose=True,
n_ctx=1100,
)
Failure Logs
ggml_metal_add_buffer: buffer 'data' size 9763717120 is larger than buffer maximum of 8589934592
llama_init_from_file: failed to add buffer