A comprehensive performance comparison of MySQL, MongoDB, and Neo4j for large-scale graph and network data operations. This project evaluates three different database paradigmsβrelational, document-oriented, and graphβusing a synthetic road network dataset with 500 cities and 250,000 connections.
- Overview
- Key Features
- Results Summary
- Quick Start
- Query Descriptions
- Performance Metrics
- Key Findings
- Technologies Used
- Project Structure
- Contributing
- Authors
- License
This project implements a rigorous benchmarking framework to compare database performance across different architectural paradigms. Using identical datasets and semantically equivalent queries, we measure:
- Execution Time - Query response latency
- CPU Usage - Processor utilization during query execution
- Memory Consumption - RAM usage patterns
The benchmark focuses on common network analytics operations: filtering, aggregation, multi-hop traversal, and shortest path algorithms.
- Automated Data Generation: Synthetic road network with 500 cities and 250,000 directed edges
- Multi-Database Population: Batch insertion optimized for each database type
- Unified Query Suite: 4 equivalent queries implemented across all platforms
- Real-Time Metrics: CPU, memory, and execution time monitoring using
psutil - Comprehensive Visualizations: 15+ comparative performance charts
- Production-Ready Code: Error handling, batch processing, and connection management
| Query Type | Best Performer | Performance Gain |
|---|---|---|
| Simple Filtering | MySQL | 8.6x faster than Neo4j |
| Aggregations | MongoDB | 2.6x faster than Neo4j |
| Multi-Hop Traversal | MySQL | 5.4x faster than MongoDB |
| Shortest Path | Neo4j | 19.2x faster than MongoDB |
- β MySQL: Best all-rounder with consistent performance
- β MongoDB: Dominant for aggregations, poor for graph queries
- β Neo4j: Specialized for pathfinding and deep graph traversals
# 1. Install dependencies
pip install -r requirements.txt
# 2. Generate dataset
python scripts/generator.py
# 3. Populate databases
python scripts/populator.py
# 4. Run benchmarks
python scripts/queries.pyπ For detailed installation instructions, database configuration, and troubleshooting, see INSTALLATION.md
.
βββ data/
β βββ final_road_network.csv # Generated dataset (500 cities, 250K roads)
β
βββ docs/
β βββ 6102 Group project paper-- jup.docx
β βββ 6102 Queries Ideas.docx
β βββ GROUP 4 PROJECT PROPOSAL.docx
β
βββ outputs/
β βββ Code Outputs/ # Query execution screenshots
β βββ Performace Visualizations/ # Performance comparison charts
β
βββ scripts/
β βββ generator.py # Synthetic data generation
β βββ populator.py # Multi-database population script
β βββ queries.py # Unified benchmark queries with metrics
β βββ hybrid_challenges.py # Additional database challenge queries
β
βββ database_queries.txt # Query reference documentation
βββ INSTALLATION.md # Detailed setup guide
βββ LICENSE # MIT License
βββ README.md # Project overview and findings
βββ requirements.txt # Python dependencies
Purpose: Find roads from Richmond to Atlanta with distance β₯ 500 miles
Use Case: Direct route lookup with conditions
Purpose: Calculate top 5 city pairs by average distance
Use Case: Statistical analysis of network connectivity
Purpose: Find all cities reachable within 2 hops from Richmond
Use Case: Neighborhood discovery, reachability analysis
Implementation:
- MySQL: Recursive CTE
- MongoDB:
$lookupaggregation pipeline - Neo4j: Variable-length pattern matching
[:ROAD*..2]
Purpose: Find shortest route from Richmond to Amman (max 2 hops)
Use Case: Navigation, route optimization
Implementation:
- MySQL: Recursive CTE with path tracking
- MongoDB: Custom BFS algorithm in Python
- Neo4j: Native
shortestPath()function
| Query | MySQL | MongoDB | Neo4j | Winner |
|---|---|---|---|---|
| Q1 | 0.25 | 0.60 | 2.15 | MySQL |
| Q2 | 2.25 | 1.50 | 3.90 | MongoDB |
| Q3 | 13.5 | 73.0 | 25.5 | MySQL |
| Q4 | 11.5 | 48.0 | 2.5 | Neo4j |
| Query | MySQL | MongoDB | Neo4j |
|---|---|---|---|
| Q1 | 10.5 | 0.8 | 0.8 |
| Q2 | 0.5 | 10.0 | 5.5 |
| Q3 | 11.5 | 11.7 | 9.3 |
| Q4 | 10.8 | 10.5 | 8.1 |
| Query | MySQL | MongoDB | Neo4j |
|---|---|---|---|
| Q1 | 31 | 42 | 30 |
| Q2 | 8 | 9 | 135 |
| Q3 | 462 | 15 | 185 |
| Q4 | 25 | 456 | 28 |
Different queries favor different architectures. No single database wins across all categories.
Neo4j's 19x advantage in shortest path queries demonstrates the value of specialized databases for specific workloads.
Document stores require custom application logic for graph operations, resulting in 5-73 second penalties.
Modern SQL databases can handle graph queries competently, making them viable for moderate graph workloads.
MongoDB's aggregation framework outperforms traditional SQL GROUP BY operations.
MySQL uses significantly more memory for complex traversals (462 MB) but delivers faster results than MongoDB's memory-efficient approach.
The 19.2x performance gain of Neo4j for shortest path algorithms justifies specialized databases for graph-heavy workloads.
Databases:
- MySQL 8.0 - Relational database with InnoDB engine and recursive CTEs
- MongoDB 5.0 - Document-oriented NoSQL with aggregation pipeline
- Neo4j 5.0 - Native graph database with Cypher query language
Tech Stack:
- Python 3.8+ with drivers for MySQL, MongoDB, and Neo4j
- Faker - Realistic synthetic data generation
- psutil - Real-time system resource monitoring
- matplotlib - Performance visualization charts
Contributions welcome! Potential improvements:
- Additional query types (3+ hop traversals, weighted shortest paths)
- More databases (PostgreSQL, ArangoDB, Redis Graph)
- Advanced optimizations (indexing strategies, query tuning)
- Extended metrics (network I/O, disk usage)
Please open an issue or submit a pull request.
Group 4 - Section 10
Data Warehousing (6102)
The George Washington University, 2025
This project is licensed under the MIT License - see LICENSE file for details.
- MySQL Recursive CTEs Documentation
- MongoDB Aggregation Pipeline
- Neo4j Cypher Query Language
- Graph Database Performance Patterns
β If you find this project helpful, please star the repository!
