-
Notifications
You must be signed in to change notification settings - Fork 285
feat: RAG API optimizations with intelligent ha/backup embedding #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
is it possible to add each of the changes piecemeal to focus on what they do individually?
Also I'm hesitant to add something generated by Claude without some validation on what it's doing, some examples of manual testing, etc. |
68c4554
to
19c5058
Compare
…eddings This comprehensive update adds intelligent backup embedding providers, performance optimizations, and comprehensive error handling: ## 🚀 New Features ### Intelligent Backup Embedding System - **Ultra-fast failover**: Socket check detects dead ports in 0.5 seconds - **Immediate failover**: Primary failure triggers instant backup attempt (no retries) - **Smart cooldown**: 1-minute cooldown after primary provider failure - **Automatic recovery detection**: Tests primary recovery when both providers fail - **Seamless switching**: LibreChat receives 200 status when backup succeeds - **Fast recovery**: Optimized retry logic prevents cascading failures - **Clear logging**: Prominent failure messages and accurate provider tracking ### Custom NVIDIA Embeddings Provider - **Full NVIDIA API compatibility** for LLaMA embedding models - **Fast port detection**: Socket check fails immediately if nothing listening - **Optimized timeouts**: 0.5s socket check, 2s connection, 3s read timeout - **Configurable parameters**: batch size, retries, timeout, input types - **Fast failover mode**: Reduced retries when backup provider configured - **Proper error handling** for NVIDIA-specific API responses ### Enhanced AWS Bedrock Support - **Titan V2 embeddings** with configurable dimensions (256/512/1024) - **Optimized timeouts**: 5s connection, 30s read (reduced from 60s default) - **Reactive rate limiting** - only activates when AWS throttles requests - **Graceful error handling** with user-friendly configuration messages - **Backward compatibility** with Titan V1 models ### Database & Performance Optimizations - **Graceful PostgreSQL error handling** - 503 responses for connection issues - **Optimized chunking strategy** - adaptive batch sizes based on chunk size - **Request throttling middleware** - prevents LibreChat overload (configurable) - **Improved UTF-8 file processing** with proper cleanup and null checks - **Enhanced connection pooling** with optimized timeout settings ## 🧪 Comprehensive Testing Suite - **59 passing unit tests** covering all functionality - **Automated failover testing** with service interruption simulation - **JWT authentication integration** matching LibreChat's auth flow - **Automatic document cleanup** after testing - **Configurable test environments** via environment variables ## 📋 Configuration ### Backup Provider Setup ```env # Primary Provider EMBEDDINGS_PROVIDER=nvidia EMBEDDINGS_MODEL=nvidia/llama-3.2-nemoretriever-300m-embed-v1 NVIDIA_TIMEOUT=3 # Fast failover - 3 second read timeout # Backup Provider EMBEDDINGS_PROVIDER_BACKUP=bedrock EMBEDDINGS_MODEL_BACKUP=amazon.titan-embed-text-v2:0 PRIMARY_FAILOVER_COOLDOWN_MINUTES=1 # Performance Tuning EMBED_CONCURRENCY_LIMIT=3 ``` ### Bedrock Titan V2 Configuration ```env BEDROCK_EMBEDDING_DIMENSIONS=512 # 256, 512, or 1024 BEDROCK_EMBEDDING_NORMALIZE=true BEDROCK_MAX_BATCH_SIZE=15 ``` ## 🛠️ Technical Improvements - **Ultra-fast port detection** - Socket check with 0.5s timeout before connection - **Immediate failover logic** - no retry delays when backup is available - **Triple-layer timeout strategy** - socket (0.5s), connection (2s), read (3s) - **Automatic recovery detection** - checks primary when both providers fail - **Optimized Bedrock timeouts** - 5s connection, 30s read for faster failover - **Conditional AWS credential loading** - only when Bedrock is configured - **Thread-safe state management** with proper locking - **Pydantic v2 compatibility** with proper field declarations - **Comprehensive error categorization** and user-friendly messages ## 📚 Documentation - **Complete environment variable documentation** in README.md - **High availability configuration examples** with NVIDIA + Bedrock setup - **Detailed provider configuration guides** for all supported embedding services - **Timeout optimization documentation** for production deployments - **Comprehensive testing documentation** with automated and manual testing procedures This update ensures robust, production-ready embedding operations with lightning-fast failover (0.5-35 seconds), optimal performance, and excellent user experience. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
19c5058
to
02f0a83
Compare
Yes, that caution is understandable. I changed that one test case that made github actions fail and there is now a new script that allows me to automatically upload a bunch of pdfs and tests the failover using iptables which then shows in the logs.
log output:
|
@danny-avila, these are the individual changes by file, they are a lot of small changes that improve robustness / performance. I installed this service as a venv under user level systemd and let Claude parse the output and code along all day while I was testing a number of failure scenarios. This is meant to work with on-premises services on an AI/HPC cluster that are only running when nobody is using the hardware for other purposes, here is a breakdown by file. Should I add more extensive code comments ?
|
@danny-avila did you want multiple commits or multiple pull requests for this change ? It's been running well in production for about 3 weeks now |
This comprehensive update adds intelligent backup embedding providers, performance optimizations, and comprehensive error handling:
After trying to upload more than 200 PDF docs in one go, I ran into some limitations, for example rate limits on AWS and Azure for their embedding models (#187). If LibreChat is installed on-premises, the cloud based embedders can add significant latency. The solutions was for us to install an Nvidia NIM with one of their embedders on a local GPU and if there is any error or if that server is not reachable, fall back to any of the cloud providers (AWS Bedrock example provided.)
🚀 New Features
Intelligent Backup Embedding System
Custom NVIDIA Embeddings Provider
Enhanced AWS Bedrock Support
Database & Performance Optimizations
📋 Configuration
Backup Provider Setup
Bedrock Titan V2 Configuration
🧪 Testing
🛠️ Technical Improvements
📚 Documentation
This update ensures robust, production-ready embedding operations with lightning-fast failover (0.5-3 seconds), optimal performance, and excellent user experience.
🤖 Generated with Claude Code