Skip to content
#

gpu-inference

Here are 35 public repositories matching this topic...

A comprehensive toolkit for deploying production-ready Generative AI infrastructure on Amazon EKS. Includes pre-configured components for: 🚀 AI Gateway (LiteLLM) 🤖 LLM Serving (vLLM, SGLang, Ollama) 📊 Vector Databases, 🔍 Embedding Models (TEI) 📈 Observability (Langfuse, Phoenix) etc. Fast-track your GenAI deployment with Kubernetes

  • Updated May 26, 2026
  • JavaScript

🚀 ClipServe: A fast API server for embedding text, images, and performing zero-shot classification using OpenAI’s CLIP model. Powered by FastAPI, Redis, and CUDA for lightning-fast, scalable AI applications. Transform texts and images into embeddings or classify images with custom labels—all through easy-to-use endpoints. 🌐📊

  • Updated Sep 29, 2024
  • Python

ModelSpec is an open, declarative specification for describing how AI models especially LLMs are deployed, served, and operated in production. It captures execution, serving, and orchestration intent to enable validation, reasoning, and automation across modern AI infrastructure.

  • Updated Apr 27, 2026
  • Python

A project to build GPU acceleration for LLaMA models on local computers and AWS, leveraging GPU resources for efficient inference and training.

  • Updated May 19, 2026
  • Python

Improve this page

Add a description, image, and links to the gpu-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpu-inference topic, visit your repo's landing page and select "manage topics."

Learn more