-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to https://github.com/abetlen/llama-cpp-python #9
Comments
I would prefer to go with the python route. |
I agree, the main problem we have right now, is this "--instruct" option in llama.cpp direct was very useful for creating daemonless interactive terminal-based chatbots:
they have actually since removed this I briefly tried to to do the same with llama-cpp-python, I couldn't get something working that worked well on a wide array of models like |
Tagging @abetlen , we also sent an email with more details to [email protected] |
Hi @ericcurtin 👋 I agree, we should go with only one and not try to support both backends. That said, I don't have a very strong opinion as to which. We currently use llama-cpp-python in the recipes as well as the extensions' playground. So if we want to keep things consistent, it probably does make the most sense to stick with llama-cpp-python here too. My only hesitation with llama-cpp-python is, it is another layer of abstraction between us and llama-cpp that we will need to rely on. And there have been a few instances in the past (getting the granite models working for example) where llama-cpp-python lagged a bit behind llama-cpp. So, really I'm open to either approach. Let's figure out what ramalam's requirements are and pick the tool that works best for us 😄 |
For what it's worth, running on MacOS sequoia (M3), llama-cpp-python consistently fails on my machine, but ramalama in its current form works. It might be worth testing to see if that holds true among more Apple silicon machines before switching |
Yeah... To be honest at this point, if we do add this, it will probably be just another --runtime, like --runtime llama-cpp-python |
No one is working on this, is this something we should still consider? |
llama-cppy-python does appear as though it implements a more feature complete OpenAI Compatible Server to the direct llama.cpp one, but I don't know for sure: https://llama-cpp-python.readthedocs.io/en/latest/server/ it also implements multi-model server support. I'm unsure, maybe we should consider it. This is one of those python things that we could probably run fine in a container, we'd probably only want read-only model access for it. |
I don't mind either way, leaving this open or closing |
Closing, not doing for now, llama-serve is ok for us for now, maybe we will re-open in future |
Right now we call llama.cpp directly, long-term we should go with either llama.cpp directly or llama-cpp-python. Because maintaining two different llama.cpp backends isn't ideal, they will never be in sync from a version perspective etc. More maintenance.
The API's of llama-cpp-python seem to be more stable, if we can get it to behave the same as the current implementation, we should consider switching.
Tagging @MichaelClifford as he suggested the idea and may be interested.
The text was updated successfully, but these errors were encountered: