A modern web application for comparing responses from different Large Language Models (LLMs) side-by-side. Compare OpenAI GPT models with Anthropic Claude, analyze performance metrics, and visualize differences with highlighting.
- 🔀 Side-by-Side Comparison: Compare responses from any two LLM models
- ⚡ Real-Time Metrics: Track response time, token usage, and performance
- 🎨 Intelligent Highlighting: Visual diff highlighting to spot differences at a glance
- 🌐 Multi-Provider Support: Works with OpenAI, Anthropic, and any OpenAI-compatible APIs
- 📱 Responsive Design: Beautiful, modern UI that works on desktop and mobile
- 🔒 Secure: API keys are never stored or transmitted to external servers
- ⚙️ Configurable: Flexible endpoint and model configuration
Simply open the llm-diff-tool.html
file in your web browser - no installation required!
# Clone the repository
git clone https://github.com/yourusername/llm-diff-tool.git
cd llm-diff-tool
# Open in your browser
open llm-diff-tool.html
# or
python -m http.server 8000 # Then visit http://localhost:8000
-
Configure Your Models
- Enter API endpoints for both models
- Add your API keys (stored locally only)
- Specify model names (e.g.,
gpt-4
,claude-3-sonnet-20240229
)
-
Enter Your Prompt
- Type or paste the prompt you want both models to respond to
-
Compare
- Click "Compare Responses" to get results from both models
- View side-by-side responses with difference highlighting
- Analyze performance metrics and token usage
-
Toggle Features
- Enable/disable difference highlighting as needed
- Scroll through longer responses easily
Endpoint: https://api.openai.com/v1/chat/completions
Models: gpt-4, gpt-4-turbo, gpt-3.5-turbo, etc.
Endpoint: https://api.anthropic.com/v1/messages
Models: claude-3-opus-20240229, claude-3-sonnet-20240229, etc.
Any API that follows the OpenAI chat completions format:
Endpoint: http://localhost:8000/v1/chat/completions
Models: llama-2-7b, mistral-7b, etc.
- OpenAI: Get your API key from OpenAI Platform
- Anthropic: Get your API key from Anthropic Console
- Local Models: Configure according to your local setup
The tool sends requests with these default parameters:
max_tokens
: 1000temperature
: 0.7- Message format: OpenAI chat completions style
- Response Time: How long each model took to respond
- Prompt Tokens: Number of tokens in your input
- Completion Tokens: Number of tokens in the model's response
- Total Tokens: Combined token usage
- Model Names: For easy identification
The tool uses intelligent word-level comparison to highlight:
- 🔴 Removed content: Text present in Model 1 but not Model 2
- 🟢 Added content: Text present in Model 2 but not Model 1
- ⚪ Unchanged content: Text that's identical in both responses
Track and compare:
- Response latency
- Token efficiency
- Output length
- Model behavior differences
- No Data Storage: All comparisons happen locally in your browser
- No External Requests: API keys and responses never leave your device
- Direct API Calls: Connects directly to LLM providers, no intermediary servers
API Key Errors
- Ensure your API keys are valid and have sufficient credits
- Check that you're using the correct endpoint for each provider
CORS Errors
- Some browsers may block direct API calls
- Use a local server (like
python -m http.server
) if needed
Response Format Issues
- Verify your model names are correct
- Ensure the API endpoint supports the chat completions format
Slow Performance
- Check your internet connection
- Some models may have longer response times
- Initial release
- OpenAI and Anthropic support
- Real-time difference highlighting
- Performance metrics tracking
- Responsive design
This project is licensed under the MIT License - see the LICENSE file for details.