Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to https://github.com/abetlen/llama-cpp-python #9

Closed
ericcurtin opened this issue Jul 30, 2024 · 10 comments
Closed

Switch to https://github.com/abetlen/llama-cpp-python #9

ericcurtin opened this issue Jul 30, 2024 · 10 comments

Comments

@ericcurtin
Copy link
Collaborator

ericcurtin commented Jul 30, 2024

Right now we call llama.cpp directly, long-term we should go with either llama.cpp directly or llama-cpp-python. Because maintaining two different llama.cpp backends isn't ideal, they will never be in sync from a version perspective etc. More maintenance.

The API's of llama-cpp-python seem to be more stable, if we can get it to behave the same as the current implementation, we should consider switching.

Tagging @MichaelClifford as he suggested the idea and may be interested.

@rhatdan
Copy link
Member

rhatdan commented Jul 30, 2024

I would prefer to go with the python route.

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jul 31, 2024

I would prefer to go with the python route.

I agree, the main problem we have right now, is this "--instruct" option in llama.cpp direct was very useful for creating daemonless interactive terminal-based chatbots:

llama-main -m model --log-disable --instruct

they have actually since removed this --instruct option in llama.cpp in the last month.

I briefly tried to to do the same with llama-cpp-python, I couldn't get something working that worked well on a wide array of models like --instruct. But I only tried for an hour maybe so, I'm sure someone would figure this out, it's something several projects have done already in one form or another.

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Aug 1, 2024

Tagging @abetlen , we also sent an email with more details to [email protected]

@MichaelClifford
Copy link
Contributor

Hi @ericcurtin 👋 I agree, we should go with only one and not try to support both backends. That said, I don't have a very strong opinion as to which. We currently use llama-cpp-python in the recipes as well as the extensions' playground. So if we want to keep things consistent, it probably does make the most sense to stick with llama-cpp-python here too.

My only hesitation with llama-cpp-python is, it is another layer of abstraction between us and llama-cpp that we will need to rely on. And there have been a few instances in the past (getting the granite models working for example) where llama-cpp-python lagged a bit behind llama-cpp.

So, really I'm open to either approach. Let's figure out what ramalam's requirements are and pick the tool that works best for us 😄

@Ben-Epstein
Copy link
Contributor

For what it's worth, running on MacOS sequoia (M3), llama-cpp-python consistently fails on my machine, but ramalama in its current form works. It might be worth testing to see if that holds true among more Apple silicon machines before switching

@ericcurtin
Copy link
Collaborator Author

Yeah... To be honest at this point, if we do add this, it will probably be just another --runtime, like --runtime llama-cpp-python

@rhatdan
Copy link
Member

rhatdan commented Nov 6, 2024

No one is working on this, is this something we should still consider?

@ericcurtin
Copy link
Collaborator Author

llama-cppy-python does appear as though it implements a more feature complete OpenAI Compatible Server to the direct llama.cpp one, but I don't know for sure:

https://llama-cpp-python.readthedocs.io/en/latest/server/

it also implements multi-model server support.

I'm unsure, maybe we should consider it. This is one of those python things that we could probably run fine in a container, we'd probably only want read-only model access for it.

@ericcurtin
Copy link
Collaborator Author

I don't mind either way, leaving this open or closing

@ericcurtin
Copy link
Collaborator Author

Closing, not doing for now, llama-serve is ok for us for now, maybe we will re-open in future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants