Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can RamaLama support Kubernetes-based inference clustering on a Mac Mini M4? #700

Open
hotwa opened this issue Feb 1, 2025 · 3 comments

Comments

@hotwa
Copy link

hotwa commented Feb 1, 2025

I’m really impressed with the work being done on RamaLama and its ability to handle various models for inference. I have been exploring the possibility of leveraging Kubernetes (K8s) for distributed model in my local environment, specifically on a Mac Mini.

Is there currently support within RamaLama to deploy and manage model inference workloads using Kubernetes on a Mac Mini?
If not natively supported, are there any recommendations or best practices for integrating RamaLama with Kubernetes for local development/testing purposes?
Are there any known limitations or considerations when running Kubernetes-based inference clusters on ARM-based hardware (like the M4 chip) using RamaLama?

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 1, 2025

It can be done, you need to install podman-machine with krunkit that's the first step https://podman.io/ . There's also kubelet generators via:

ramalama serve --generate

Most of the bits are there, just need someone to tie it all together.

@ericcurtin
Copy link
Collaborator

Would make a great blog post!

@rhatdan
Copy link
Member

rhatdan commented Feb 1, 2025

ramalama serve --generate kube MODEL

Will generate a k8s deployment for the containerized AI Model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants