Skip to content

buffer 'data' size 9763717120 is larger than buffer maximum of 8589934592 #438

Open
@frankandrobot

Description

@frankandrobot

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

  • GGML v3 models should load just fine. As per this post, this type of error should have been resolved.
  • In particular, models like LLaMa-13B-GGML/llama-13b.ggmlv3.q5_1.bin or llama-13b.ggmlv3.q6_K.bin should load.

Current Behavior

  • However, instead, I'm getting this error
ggml_metal_add_buffer: buffer 'data' size 9763717120 is larger than buffer maximum of 8589934592
llama_init_from_file: failed to add buffer

Environment and Context

llama-cpp-python         0.1.67
M2 Pro
16 GB
Macosx

Darwin UAVALOS-M-NR30 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun  8 22:22:23 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6020 arm64

Python 3.10.10
GNU Make 3.81
Apple clang version 14.0.3 (clang-1403.0.22.14.1)

Failure Information (for bugs)

See above

Steps to Reproduce

!pip uninstall llama-cpp-python -y
!CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir
!pip install 'llama-cpp-python[server]'

from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

model_path = "/Users/uavalos/Documents/LLaMa-13B-GGML/llama-13b.ggmlv3.q5_1.bin";
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

n_gpu_layers = 1
n_batch = 512 

llm = LlamaCpp(
    model_path=model_path,
    n_gpu_layers=n_gpu_layers, n_batch=n_batch,
    callback_manager=callback_manager, 
    verbose=True,
    n_ctx=1100,
)

Failure Logs

ggml_metal_add_buffer: buffer 'data' size 9763717120 is larger than buffer maximum of 8589934592
llama_init_from_file: failed to add buffer

Metadata

Metadata

Assignees

No one assigned

    Labels

    llama.cppProblem with llama.cpp shared lib

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions