Skip to content
181 changes: 181 additions & 0 deletions RATE_LIMITING_IMPLEMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# Rate Limiting Implementation Summary

This document summarizes the API rate limiting implementation for the RTAC project.

## Implementation Status

All acceptance criteria have been successfully implemented:

### 1. Rate limiting works correctly
- **Sliding window algorithm** implemented for accurate rate limiting
- **Configurable limits** for different endpoints
- **Memory-efficient** with automatic cleanup of old requests
- **Thread-safe** with async locks for concurrent requests

### 2. Headers are included
- `X-RateLimit-Limit` - Maximum requests allowed in window
- `X-RateLimit-Remaining` - Remaining requests in current window
- `X-RateLimit-Reset` - Unix timestamp when window resets
- `X-RateLimit-Window` - Window size in seconds
- `Retry-After` - Seconds to wait before retrying (when rate limited)

### 3. Errors are handled
- **429 Too Many Requests** status code when rate limit exceeded
- **Structured error responses** with clear messages
- **Retry guidance** included in error responses
- **Custom exception handler** for rate limit errors

### 4. Configuration is flexible
- **Environment variable configuration** for all settings
- **Different limits** for different endpoints (global vs chat)
- **Enable/disable** rate limiting via configuration
- **Exempted endpoints** for health checks and documentation

## Files Modified

### Required Files
1. **`app/api/v1/agents_router.py`**
- Added rate limit exception handler
- Added `/rate-limits` endpoint for configuration info

2. **`app/middleware/rate_limit.py`** (Created)
- Complete rate limiting middleware implementation
- Sliding window algorithm
- Client identification logic
- Rate limit headers management

### Additional Files
3. **`app/config/settings.py`**
- Added rate limiting configuration variables
- Environment variable definitions

4. **`main.py`**
- Integrated rate limiting middleware
- Conditional enablement based on configuration

5. **`env.example`**
- Added rate limiting environment variables
- Default configuration values

## Configuration

### Environment Variables Added
```env
# Rate Limiting Configuration
RATE_LIMIT_ENABLED=true # Enable/disable rate limiting
RATE_LIMIT_REQUESTS=100 # Global requests per hour
RATE_LIMIT_WINDOW=3600 # Global window (1 hour)
RATE_LIMIT_CHAT_REQUESTS=30 # Chat requests per 5 minutes
RATE_LIMIT_CHAT_WINDOW=300 # Chat window (5 minutes)
```

### Default Rate Limits
- **Global endpoints**: 100 requests per hour
- **Chat endpoint**: 30 requests per 5 minutes
- **Exempted endpoints**: No limits (health, docs, static files)

## Key Features

### Smart Client Identification
- Handles requests behind proxies (`X-Forwarded-For`)
- Supports load balancers (`X-Real-IP`)
- Fallback to direct client IP

### Endpoint-Specific Limits
- Stricter limits for resource-intensive chat endpoint
- Relaxed limits for status and info endpoints
- Complete exemption for health checks and documentation

### Comprehensive Error Handling
- Graceful degradation if rate limiting fails
- Structured error responses with retry guidance
- Proper HTTP status codes and headers

### Performance Optimized
- In-memory storage with automatic cleanup
- Async-safe with proper locking
- Efficient sliding window algorithm

## Testing

### Test Files Created
1. **`tests/test_rate_limit.py`**
- Unit tests for rate limit store
- Integration tests for middleware
- Performance and concurrency tests

2. **`scripts/test_rate_limiting.py`**
- Manual testing script
- Tests all endpoints and scenarios
- Validates headers and responses

### Test Coverage
- Rate limit enforcement
- Header inclusion
- Error responses
- Exempted endpoints
- Concurrent requests
- Configuration validation

## Documentation

### Documentation Created
1. **`docs/rate-limiting.md`**
- Complete implementation guide
- Configuration reference
- API examples and responses
- Best practices for clients
- Troubleshooting guide

## Deployment Checklist

### Before Deployment
- [ ] Set appropriate rate limits for production
- [ ] Configure environment variables
- [ ] Test with expected traffic patterns
- [ ] Monitor memory usage in production

### Production Configuration
```env
# Recommended production settings
RATE_LIMIT_ENABLED=true
RATE_LIMIT_REQUESTS=100
RATE_LIMIT_WINDOW=3600
RATE_LIMIT_CHAT_REQUESTS=30
RATE_LIMIT_CHAT_WINDOW=300
```

## Monitoring

### Log Messages
Rate limiting activities are logged with appropriate levels:
- Rate limit exceeded events (WARNING)
- System errors (ERROR)
- Normal operations (DEBUG)

### Metrics to Monitor
- Rate limit hit rates by endpoint
- Client distribution and patterns
- Memory usage of rate limit store
- Response times with middleware

## Benefits Achieved

1. **Abuse Prevention**: Protects against excessive API usage
2. **Fair Usage**: Ensures all users get equitable access
3. **System Stability**: Prevents overload of backend services
4. **Cost Control**: Reduces infrastructure costs from abuse
5. **Better UX**: Provides clear feedback to legitimate users

## Future Enhancements

Consider these improvements for advanced use cases:
- Redis-based distributed rate limiting
- User-based rate limiting (after authentication)
- Dynamic rate limits based on system load
- Rate limiting analytics dashboard
- IP whitelisting for trusted clients

---

**Implementation Complete**: All acceptance criteria met with comprehensive testing and documentation.
59 changes: 59 additions & 0 deletions app/api/v1/agents_router.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,21 @@
# Track agent startup time
startup_time = time.time()


@router.exception_handler(429)
async def rate_limit_handler(request: Request, exc: HTTPException):
"""Handle rate limit exceeded errors."""
return JSONResponse(
status_code=429,
content={
"success": False,
"error": "Rate limit exceeded",
"message": "Too many requests. Please try again later.",
"support_contact": settings.support_phone
},
headers=getattr(exc, "headers", {})
)

@router.post(
"/chat",
response_model=SuccessResponseSchema[AgentResponse],
Expand Down Expand Up @@ -186,4 +201,48 @@ async def get_conference_info():
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Failed to get conference information"
)


@router.get(
"/rate-limits",
response_model=SuccessResponseSchema[dict],
status_code=status.HTTP_200_OK,
summary="Get rate limit information",
description="Get current rate limiting configuration and status"
)
async def get_rate_limits():
"""Get rate limiting information."""
try:
rate_limit_info = {
"enabled": settings.rate_limit_enabled,
"global_limits": {
"requests": settings.rate_limit_requests,
"window_seconds": settings.rate_limit_window,
"window_description": f"{settings.rate_limit_window // 60} minutes"
},
"chat_limits": {
"requests": settings.rate_limit_chat_requests,
"window_seconds": settings.rate_limit_chat_window,
"window_description": f"{settings.rate_limit_chat_window // 60} minutes"
},
"headers_included": [
"X-RateLimit-Limit",
"X-RateLimit-Remaining",
"X-RateLimit-Reset",
"X-RateLimit-Window",
"Retry-After (when limit exceeded)"
]
}

return SuccessResponseSchema(
data=rate_limit_info,
message="Rate limit information retrieved successfully"
)

except Exception as e:
logger.error(f"Error getting rate limit info: {e}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Failed to get rate limit information"
)
7 changes: 7 additions & 0 deletions app/config/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,13 @@ class Settings(BaseSettings):
secret_key: str = Field(..., env="SECRET_KEY")
access_token_expire_minutes: int = Field(30, env="ACCESS_TOKEN_EXPIRE_MINUTES")

# Rate Limiting Configuration
rate_limit_requests: int = Field(100, env="RATE_LIMIT_REQUESTS")
rate_limit_window: int = Field(3600, env="RATE_LIMIT_WINDOW") # 1 hour in seconds
rate_limit_chat_requests: int = Field(30, env="RATE_LIMIT_CHAT_REQUESTS")
rate_limit_chat_window: int = Field(300, env="RATE_LIMIT_CHAT_WINDOW") # 5 minutes in seconds
rate_limit_enabled: bool = Field(True, env="RATE_LIMIT_ENABLED")

@field_validator("cors_origins", mode="before")
@classmethod
def parse_cors_origins(cls, v):
Expand Down
3 changes: 3 additions & 0 deletions app/middleware/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
"""
Middleware package for request processing.
"""
Loading