A hyper-efficient, lightweight AI Gateway that provides a unified interface to access various AI model providers through a single endpoint. Built for edge deployment using Cloudflare Workers, it offers seamless integration with popular AI providers while maintaining high performance and low latency.
- π Edge-Optimized Performance: Built on Cloudflare Workers for minimal latency
- π Universal Interface: Single endpoint for multiple AI providers
- π Provider Agnostic: Easily switch between different AI providers
- π‘ Streaming Support: Real-time streaming responses for all supported providers
- π Extensible Middleware: Customizable request/response pipeline
- β Built-in Validation: Automatic request validation and error handling
- π Auto-Transform: Automatic request/response transformation
- π Detailed Metrics: Comprehensive request metrics and cost tracking
- π Comprehensive Logging: Detailed logging for monitoring and debugging
- πͺ Type-Safe: Built with TypeScript for robust type safety
- π OpenAI Compatible: Drop-in replacement for OpenAI's API
Provider | Streaming | OpenAI Compatible |
---|---|---|
OpenAI | β | Native |
Anthropic | β | β |
GROQ | β | β |
Fireworks | β | β |
Together | β | β |
# Install Wrangler CLI
npm install -g wrangler
# Clone and Setup
git clone https://github.com/Noveum/ai-gateway.git
cd ai-gateway
npm install
# Login to Cloudflare
wrangler login
# Development
npm run dev # Server starts at http://localhost:3000
# Deploy
npm run deploy
docker pull noveum/ai-gateway:latest
docker run -p 3000:3000 noveum/ai-gateway:latest
The gateway provides a drop-in replacement for OpenAI's API. You can use your existing OpenAI client libraries by just changing the base URL:
// TypeScript/JavaScript
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'http://localhost:3000/v1',
apiKey: 'your-provider-api-key',
defaultHeaders: { 'x-provider': 'openai' }
});
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Hello!' }]
});
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-provider: anthropic" \
-H "Authorization: Bearer your-anthropic-api-key" \
-d '{
"model": "claude-3-sonnet-20240229-v1:0",
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.7,
"max_tokens": 1000
}'
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-provider: groq" \
-H "Authorization: Bearer your-groq-api-key" \
-d '{
"model": "mixtral-8x7b-32768",
"messages": [{"role": "user", "content": "Hello!"}],
"temperature": 0.7,
"max_tokens": 1000
}'
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3000/v1",
api_key="your-provider-api-key",
default_headers={"x-provider": "anthropic"} # or any other provider
)
stream = client.chat.completions.create(
model="claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709312768,
"model": "gpt-4",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 9,
"total_tokens": 19
},
"system_fingerprint": "fp_1234",
"metrics": {
"latency_ms": 450,
"tokens_per_second": 42.2,
"cost": {
"input_cost": 0.0003,
"output_cost": 0.0006,
"total_cost": 0.0009
}
}
}
We welcome contributions! Here are some tasks we're actively looking for help with:
-
AWS Bedrock Integration
- Add support for AWS Bedrock models
- Implement authentication and cost tracking
- Get Started β
-
Testing Framework
- Set up unit and integration tests
- Add provider-specific test cases
- Get Started β
-
Performance Benchmarks
- Create benchmarking suite
- Compare with other AI gateways
- Get Started β
-
Prometheus Integration
- Add metrics exporter
- Create Grafana dashboards
- Get Started β
-
Response Caching
- Implement caching layer
- Add cache invalidation
- Get Started β
-
Rate Limiting
- Add per-user rate limits
- Implement token bucket algorithm
- Get Started β
-
Provider Guides
- Create setup guides for each provider
- Add troubleshooting sections
- Get Started β
-
Deployment Examples
- Add Docker Compose examples
- Create cloud deployment guides
- Get Started β
Want to contribute?
- Pick a task from above
- Open an issue to discuss your approach
- Submit a pull request
Need help? Join our Discord or check existing issues.
The gateway collects detailed metrics for every request, providing insights into:
- π Real-time performance tracking
- π° Token usage and cost calculation
- π Streaming metrics support
- π Provider-specific metadata
- β±οΈ Latency and TTFB monitoring
- π Detailed debugging information
For detailed metrics documentation, see METRICS.md
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- GitHub Issues: https://github.com/Noveum/ai-gateway/issues
- Twitter: @NoveumAI
Copyright 2024 Noveum AI
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
- Built with Hono
- Deployed on Cloudflare Workers
- GitHub Issues: https://github.com/Noveum/ai-gateway/issues
- Twitter: @NoveumAI