buffer 'data' size 9763717120 is larger than buffer maximum of 8589934592

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

* GGML v3 models should load just fine. As per this [post](https://github.com/ggerganov/llama.cpp/issues/1815), this type of error should have been resolved.
* In particular, models like `LLaMa-13B-GGML/llama-13b.ggmlv3.q5_1.bin` or `llama-13b.ggmlv3.q6_K.bin` should load.

# Current Behavior

* However, instead, I'm getting this error

```
ggml_metal_add_buffer: buffer 'data' size 9763717120 is larger than buffer maximum of 8589934592
llama_init_from_file: failed to add buffer
```

# Environment and Context

```
llama-cpp-python         0.1.67
M2 Pro
16 GB
Macosx

Darwin UAVALOS-M-NR30 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun  8 22:22:23 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6020 arm64

Python 3.10.10
GNU Make 3.81
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
```

# Failure Information (for bugs)

See above

# Steps to Reproduce

```
!pip uninstall llama-cpp-python -y
!CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir
!pip install 'llama-cpp-python[server]'

from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

model_path = "/Users/uavalos/Documents/LLaMa-13B-GGML/llama-13b.ggmlv3.q5_1.bin";
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

n_gpu_layers = 1
n_batch = 512 

llm = LlamaCpp(
    model_path=model_path,
    n_gpu_layers=n_gpu_layers, n_batch=n_batch,
    callback_manager=callback_manager, 
    verbose=True,
    n_ctx=1100,
)
```

# Failure Logs


```
ggml_metal_add_buffer: buffer 'data' size 9763717120 is larger than buffer maximum of 8589934592
llama_init_from_file: failed to add buffer
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

buffer 'data' size 9763717120 is larger than buffer maximum of 8589934592 #438

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

buffer 'data' size 9763717120 is larger than buffer maximum of 8589934592 #438

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions