-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial REST API for ramalama #726
base: main
Are you sure you want to change the base?
Conversation
Just an initial code, we can always improve on the road. Resolves: containers#725 Signed-off-by: Douglas Schilling Landgraf <[email protected]>
Reviewer's Guide by SourceryThis pull request introduces an initial REST API for interacting with ramalama. It uses FastAPI to define endpoints for retrieving information, listing models, pulling models, running models, and stopping models. The API interacts with the ramalama CLI using subprocess. Sequence diagram for model management operationssequenceDiagram
participant Client
participant API as FastAPI Server
participant CLI as RamaLama CLI
Client->>API: GET /models
API->>CLI: ramalama list
CLI-->>API: model list
API-->>Client: JSON response
Client->>API: POST /pull/{model}
API->>CLI: ramalama pull {model}
CLI-->>API: success/error
API-->>Client: JSON response
Client->>API: POST /run/{model}
API->>CLI: ramalama run {model}
CLI-->>API: success/error
API-->>Client: JSON response
Client->>API: POST /stop/{model}
API->>CLI: ramalama stop {model}
CLI-->>API: success/error
API-->>Client: JSON response
Class diagram for FastAPI endpointsclassDiagram
class FastAPIApp {
+get_root()
+get_info()
+get_ps()
+list_models()
+pull_model(model_name: str)
+run_model(model_name: str)
+stop_model(model_name: str)
}
note for FastAPIApp "All methods return JSON responses"
note for FastAPIApp "Uses subprocess to interact with CLI"
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @dougsland - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
- The model_name parameter is passed directly to subprocess.run() without validation, creating a potential command injection vulnerability. (link)
Overall Comments:
- SECURITY: The API is executing shell commands directly from HTTP requests without proper input validation or sanitization. This creates a significant security vulnerability. Consider using a safer interface to the ramalama functionality or implementing strict input validation.
- The API lacks any authentication mechanism. Given that this service can control AI model execution, authentication and authorization should be implemented before this goes into production.
Here's what I looked at during the review
- 🟡 General issues: 3 issues found
- 🔴 Security: 1 blocking issue, 2 other issues
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
We can merge this tool because it's self contained, note though, we have a differently designed tool planned that does similar things called ramalama-server. It won't have any python dependancies outside of stdlib and it will be called by "ramalama serve". We have to ensure the dependancies of the tool in this PR don't leak into the RamaLama packages outside of the container. If people install this separately, it's fine. |
It's described here: |
I prefer we write up this API first before implementing it and make sure Podman Desktop team are happy with it. The service needs to be installable and useable on MAC and Windows platforms as well as Linux. If it is going to be a separate tool then standard RamaLama, and Python pulls in too much stuff then the tool should probably be built in Go versus Python. |
All good, I can write in go if you guys prefer. |
My intention is write something aligned with all projects. Just wrote in REST as it's easy to run under mac or windows. It's a simple curl/http call. |
I think golang makes as much sense as python, both work for me, the main thing is if we don't need many dependancies. It would be nice to not have any, it can be ideal for some platforms like native macOS, Ubuntu, etc. We had good joy up to now writing http stuff without any dependancies in python3 @swarajpande5 wrote a fair bit. And we've had good feedback in the community about that (not having a large python dependancy stack outside the containers). golang means we have to worry about building more, but not against it, if we get some advantages from golang. |
For Go, the tool would need to be built for all platforms to guarantee a smooth experience just like with ramalama. On the other side python is slower - although I think this is neglectable in our case. @dougsland Have you thought about using OpenAPI? This way the users can generate a client from that specification without any hassle. We can either generate it from code (see openapi generator) or we write the spec directly and use it ourselves to generate our server stubs from it. |
could be a good idea @engelmi . |
With this API does the caller get back the stdout and stderr? |
Correct. Example: Linux http server:
MacOS + curl:
Error:
info:
I would like to do more adjustments but pretty much is mimic the output from server to client in any OS using http. However, if we want different approach, worth stopping and starting OpenAPI or Go or pure stdlib python (which can be done quick) but a feedback from PodmanDesktop team is preferable so we are all in sync. Finally, we can also implement authentication over http. |
The main thing with golang is we have to start doing things like macOS builds properly, but we already do that for podman so 🤷♂️ |
Or else we say, it's just not a macOS native feature and put the golang to be launched inside the container. |
If we can just add I don't see the need for the
|
I would +1 for any solution not using python ( or any other runtime dependency) On the API, then for example 'pull' what would be the data being sent for the progress, etc. also would it be the same kind of pulling than the hugging faces libraries ? |
Open WebUI would be a great test tool for this btw (Open WebUI was also a requested feature at FOSDEM). It's one of the highest requested features and exposes a lot of the missing APIs from llama.cpp and vllm (some of them are probably Ollama-specific, but if they are easy to implement, that's no biggie). |
The full solution will NOT be python free. The RESTAPI tool will be executing RamaLama commands. Or at least calling Ramalama functions. RamaLama is not going to be ported to a different language at this time. |
Just an initial code, we can always improve on the road.
Resolves: #725
Summary by Sourcery
New Features: