This server provides an OpenAI-compatible API interface to local LLM models using Ollama. It includes features for vector storage, conversation history, RAG (Retrieval-Augmented Generation), and various tools like web search, note management, and memory features. The server can be customized to work with different Ollama models and deployment configurations.
- OpenAI API Compatibility: Drop-in replacement for OpenAI API endpoints
- Local LLM Integration: Run models locally through Ollama
- Vector Storage: Store and retrieve conversation history using Redis
- RAG Support: Enhance responses with relevant document retrieval
- Conversation Memory: Long-term memory for personalized interactions
- Tool Integration: Web search, notes management, and more
- Streaming Responses: Support for streaming completions
- Model Mapping: Map OpenAI model names to local models
- Python 3.8+
- Docker (for Redis)
- Ollama with required models installed
- Clone the repository:
git clone https://github.com/yourusername/openai-compatible-server.git
cd openai-compatible-server
- Install dependencies:
pip install -r requirements.txt
- Start Redis using Docker:
docker run --name redis-vector -p 6379:6379 -d redis/redis-stack:latest
- Ensure Ollama is installed with your required models:
# Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh
# Pull required models
ollama pull llama3.3
ollama pull mistral-small
ollama pull llama3.2
ollama pull llama3.1
pip install requirements.txt
# Assuming you're in the project root directory
mkdir -p ./data/notes
touch ./data/system_prompt.txt
touch ./data/tool_system_prompt.txt
touch ./data/core_memories.txt
Edit ./data/system_prompt.txt
with your preferred system prompt. This sets the personality and capabilities of the assistant.
Edit ./data/tool_system_prompt.txt
with your preferred tool system prompt template. This defines how tool results are formatted.
Modify the model mappings in the code to match your setup:
# Model mapping from OpenAI to local models
MODEL_MAPPING = {
"gpt-4": "llama3.3",
"gpt-3.5-turbo": "mistral-small",
"gpt-3.5-turbo-0125": "llama3.1",
"gpt-3.5-turbo-1106": "llama3.2",
# Add more mappings as needed
"default": "llama3.3"
}
# Available local models
AVAILABLE_MODELS = [
"llama3.3:latest",
"mistral-small:latest",
"llama3.2:latest",
"llama3.1:latest"
# Add or remove models as needed
]
# URLs for different models
MODEL_URLS = {
"llama3.3": "http://x.x.x.x:11434/api/chat",
"llama3.2": "http://localhost:11434/api/chat",
"llama3.1": "http://localhost:11434/api/chat",
"mistral-small": "http://localhost:11434/api/chat",
"default": "http://localhost:11434/api/chat"
}
You need to update the hardcoded paths in the code to match your environment. The main file paths to update are:
# Logging path (line ~45-46)
logging.FileHandler("/home/david/sara-jarvis/Test/openai_server.log")
# Change to:
logging.FileHandler("./logs/openai_server.log")
# Notes directory (line ~92)
NOTES_DIRECTORY = "/home/david/Sara/notes"
# Change to:
NOTES_DIRECTORY = "./data/notes"
# Core memory file (line ~99)
CORE_MEMORY_FILE = "/home/david/Sara/core_memories.txt"
# Change to:
CORE_MEMORY_FILE = "./data/core_memories.txt"
# System prompt file path (line ~509)
prompt_file = "/home/david/Sara/system_prompt.txt"
# Change to:
prompt_file = "./data/system_prompt.txt"
# Tool system prompt file path (line ~533)
prompt_file = "/home/david/Sara/tool_system_prompt.txt"
# Change to:
prompt_file = "./data/tool_system_prompt.txt"
Make sure to create the logs
directory in your project folder:
mkdir -p ./logs
To change which local models map to OpenAI model names, edit the MODEL_MAPPING
dictionary:
MODEL_MAPPING = {
"gpt-4": "your-preferred-model",
"gpt-3.5-turbo": "your-other-model",
# Add more mappings as needed
"default": "your-default-model"
}
Update the AVAILABLE_MODELS
list to include the models you have installed through Ollama:
AVAILABLE_MODELS = [
"your-model-1:latest",
"your-model-2:latest",
# Add more models as needed
]
If you're running Ollama on different machines or ports, update the MODEL_URLS
dictionary:
MODEL_URLS = {
"your-model-1": "http://ip-address-1:11434/api/chat",
"your-model-2": "http://ip-address-2:11434/api/chat",
"default": "http://localhost:11434/api/chat"
}
Update the file paths throughout the code to match your directory structure:
# Update these paths to match your environment
NOTES_DIRECTORY = "./data/notes"
CORE_MEMORY_FILE = "./data/core_memories.txt"
logging.FileHandler("./logs/openai_server.log")
Run the server with:
uvicorn server:app --host 0.0.0.0 --port 7009 --reload
The server provides these main endpoints:
/v1/chat/completions
- OpenAI-compatible chat completions/v1/embeddings
- Generate embeddings/v1/models
- List available models/api/chat
- Legacy chat endpoint/health
- Health check endpoint/v1/conversations
- Manage conversations/rag/*
- RAG API endpoints
import requests
import json
url = "http://localhost:7009/v1/chat/completions"
headers = {
"Content-Type": "application/json"
}
data = {
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"stream": False
}
response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json())
The server includes several built-in tools:
send_message
: Formulate response thinkingsearch_perplexica
: Web search integrationappend_core_memory
: Add important information to memoryrewrite_core_memories
: Update the entire memory setcreate_note
,read_note
,append_note
,delete_note
,list_notes
: Note management- RAG integration for document retrieval
- Conversations are stored in Redis
- Access conversation history via
/v1/conversations
endpoints - Embeddings for semantic search
-
Redis Connection Errors:
- Verify Redis Docker container is running:
docker ps | grep redis-vector
- Check Redis connection parameters in the code
- Verify Redis Docker container is running:
-
Ollama Model Issues:
- Verify models are installed:
ollama list
- Check Ollama is running:
curl http://localhost:11434/api/version
- Verify models are installed:
-
API Endpoint Errors:
- Check server logs for detailed error messages
- Verify request format matches OpenAI API specifications
Check the server logs for detailed information:
tail -f ./logs/openai_server.log
To change all hardcoded paths in the codebase, you should search for and replace these patterns:
- Find all instances of
/home/david/sara-jarvis/Test/
and replace with./logs/
- Find all instances of
/home/david/Sara/
and replace with./data/
Here's a bash command to find all paths that might need changing:
grep -r "/home/" . --include="*.py"
You can make these replacements using your code editor's search and replace feature with regex support.
-
Create a backup of your original code:
cp server.py server.py.backup
-
Update import paths if needed:
# Look for lines like: from modules.perplexica_module import PerplexicaClient # If module locations have changed, update accordingly
-
Update file handler locations:
# Find logging setup (around line 41-47) logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', handlers=[ logging.StreamHandler(sys.stdout), logging.FileHandler("./logs/openai_server.log") # Updated path ] )
-
Update data directories:
# Find NOTES_DIRECTORY definition (around line 92) NOTES_DIRECTORY = "./data/notes" # Updated path # Find CORE_MEMORY_FILE definition (around line 99) CORE_MEMORY_FILE = "./data/core_memories.txt" # Updated path
-
Update system prompt loading functions:
# Find load_system_prompt function (around line 509) def load_system_prompt(): """Load the system prompt from a file""" prompt_file = "./data/system_prompt.txt" # Updated path # ... rest of function ... # Find load_tool_system_prompt function (around line 533) def load_tool_system_prompt(): """Load the tool system prompt template from a file""" prompt_file = "./data/tool_system_prompt.txt" # Updated path # ... rest of function ...