create_chat_completion is stuck in versions 0.2.84 and 0.2.85 for Mac Silicon

# Prerequisites

Version 0.2.84 or 0.2.85 and using create_chat_completion method.
Tried different GGUF models.

Please answer the following questions for yourself before submitting an issue.

- [ X ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [ X ] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [ X ] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [ X ] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

Provide a result as described in the documentation. 

# Current Behavior

Inference is stuck (I let it run for 5 minutes).
After downgrading to version 0.2.83 everything runs without a single change in the code.

# Environment and Context

Mac M1 MAX, 32GB RAM, MacOS 14.5, Python 3.12, llama-cpp-python 0.2.84/5.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

create_chat_completion is stuck in versions 0.2.84 and 0.2.85 for Mac Silicon #1648

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

create_chat_completion is stuck in versions 0.2.84 and 0.2.85 for Mac Silicon #1648

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions