Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama.generate: prefix-match hit is very slow. #1437

Open
ndy200 opened this issue May 7, 2024 · 3 comments
Open

Llama.generate: prefix-match hit is very slow. #1437

ndy200 opened this issue May 7, 2024 · 3 comments

Comments

@ndy200
Copy link

ndy200 commented May 7, 2024

I upgraded from an older version, and experienced a disturbingly long read-ahead time.
The load on my machine is about the same (a bit higher with python, but that's understandable)
I tried to specify the same environment, using an nvidia card, but with n_gpu_layers=0.
For python binding, it may take several seconds to start the response. The token generation itself is done at a similar speed, but for llama.cpp the response starts immediately, while for python binding it takes seconds.
I would like to know if I am the only one experiencing this?
I am using LLama3 model.

So, the original binary values are:
llama_print_timings: sample time = 92.31 ms / 1160 runs ( 0.08 ms per token, 12565.67 tokens per second)
The llamaPython's are:
llama_print_timings: sample time = 99.82 ms / 144 runs ( 0.69 ms per token, 1442.57 tokens per second)

This seems like a big difference.

@woheller69
Copy link

Maximilian-Winter/llama-cpp-agent#54
Probably that is related to my findings that llama-cpp-python with llama-cpp-agent is slower than gpt4all on the follow-up prompts.
First prompt is fast.

@woheller69
Copy link

Related to #1369 ?

@aoom
Copy link

aoom commented Sep 10, 2024

Encountered a similar problem, which manifested itself as loading the model abnormally slow under gpu (unified memory for arm platforms) and only using a single core single thread for the cpu. This problem only exists in the last few new releases. It was working fine before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants