DB25 is a modern Hybrid Transactional/Analytical Processing (HTAP) database system built on the PostgreSQL Query Library foundation. It provides a comprehensive C++17 implementation that unifies OLTP and OLAP workloads, featuring SQL query parsing, logical/physical planning, vectorized execution, and computational storage integration.
- SQL Query Parsing: Parse and validate SQL queries using libpg_query
- Query Normalization: Normalize queries and generate fingerprints
- Schema Management: Define and manage complete database schemas
- Query Validation: Validate queries against defined schemas
- Query Analysis: Analyze query structure and dependencies
- Index Suggestions: Get optimization recommendations
- Logical Planning: Advanced query planning with cost-based optimization
- Physical Planning: Convert logical plans to executable physical plans
- Physical Execution Engine: Iterator-based execution with vectorized processing
- Workload Classification: Intelligent OLTP/OLAP query routing
- Dual Storage Support: Row-oriented (OLTP) and column-oriented (OLAP) processing
- Scan Operators: Sequential scan, index scan with filter pushdown
- Join Algorithms: Nested loop join, hash join with build/probe phases
- Sort Operations: In-memory and external sort algorithms
- Aggregation: Hash-based GROUP BY processing
- Parallel Execution: Multi-threaded operator execution
- Memory Management: Work memory limits and spill-to-disk strategies
- Plan Visualization: PostgreSQL-style execution plan display
- Cost Estimation: Sophisticated cost modeling for query optimization
- Plan Optimization: Predicate pushdown, join reordering, and other optimizations
- Vectorized Execution: Batch-based tuple processing for analytical workloads
- Computational Storage: Framework for near-data processing integration
- Complete E-commerce Schema: Ready-to-use database schema example
- C++17 compatible compiler (GCC 7+, Clang 6+, MSVC 2017+)
- CMake 3.15+
- libpg_query development library
- pkg-config
# Install dependencies
sudo apt-get update
sudo apt-get install build-essential cmake pkg-config
# Build and install libpg_query from source
git clone https://github.com/pganalyze/libpg_query.git
cd libpg_query
make
sudo make install
# Using Homebrew
brew install libpq
brew install pkg-config
# Build libpg_query from source
git clone https://github.com/pganalyze/libpg_query.git
cd libpg_query
make
sudo make install
# Install dependencies
sudo yum install gcc gcc-c++ cmake pkg-config
# or for newer versions:
sudo dnf install gcc gcc-c++ cmake pkg-config
# Build libpg_query from source
git clone https://github.com/pganalyze/libpg_query.git
cd libpg_query
make
sudo make install
# Build library and example
make all
# Build only the library
make library
# Build only the example
make example
# Run the example
make test
# Clean build files
make clean
# Create build directory
mkdir build && cd build
# Configure and build
cmake ..
make
# Run the example
./example
# Use the make target for CMake
make cmake
lib_pg_cpp/
├── include/ # Header files
│ ├── pg_query_wrapper.hpp # libpg_query wrapper
│ ├── database.hpp # Database schema classes
│ ├── simple_schema.hpp # Simple schema example
│ ├── query_executor.hpp # Query validation and analysis
│ ├── logical_plan.hpp # Logical plan node definitions
│ ├── query_planner.hpp # Advanced query planning
│ ├── physical_plan.hpp # Physical plan node definitions
│ ├── physical_planner.hpp # Logical to physical plan conversion
│ └── execution_engine.hpp # Physical execution engine
├── src/ # Source files
│ ├── pg_query_wrapper.cpp
│ ├── database.cpp
│ ├── simple_schema.cpp
│ ├── query_executor.cpp
│ ├── logical_plan.cpp
│ ├── query_planner.cpp
│ ├── physical_plan.cpp # Physical plan implementation
│ ├── physical_planner.cpp # Physical plan generation
│ └── execution_engine.cpp # Execution engine implementation
├── examples/ # Example applications
│ ├── main.cpp # Basic functionality demo
│ ├── planning_demo.cpp # Logical planning demo
│ └── execution_demo.cpp # Physical execution demo
├── docs/ # Documentation
│ ├── query_engine_architecture.tex # Comprehensive technical documentation
│ ├── build_pdf.sh # Documentation build script
│ └── README.md # Documentation guide
├── CMakeLists.txt # CMake configuration
├── Makefile # Make configuration
└── README.md # This file
#include "pg_query_wrapper.hpp"
db25::QueryParser parser;
// Parse a query
auto result = parser.parse("SELECT * FROM users WHERE id = 1");
if (result.is_valid) {
std::cout << "Valid SQL query" << std::endl;
}
// Get query fingerprint
auto fingerprint = parser.get_query_fingerprint("SELECT * FROM users WHERE id = ?");
if (fingerprint) {
std::cout << "Fingerprint: " << *fingerprint << std::endl;
}
// Normalize query
auto normalized = parser.normalize("SELECT * FROM users WHERE id = 123");
if (normalized.is_valid) {
std::cout << "Normalized: " << normalized.normalized_query << std::endl;
}
#include "database.hpp"
#include "sample_schema.hpp"
// Create a complete e-commerce schema
auto schema = db25::ECommerceSchema::create_schema();
// Generate SQL for creating tables
std::string create_sql = schema.generate_create_sql();
std::cout << create_sql << std::endl;
// Check if table exists
bool has_users = schema.get_table("users").has_value();
// Get table details
auto users_table = schema.get_table("users");
if (users_table) {
for (const auto& column : users_table->columns) {
std::cout << column.name << " - ";
if (column.primary_key) std::cout << "Primary Key";
std::cout << std::endl;
}
}
#include "query_executor.hpp"
auto schema = std::make_shared<db25::DatabaseSchema>(db25::ECommerceSchema::create_schema());
db25::QueryExecutor executor(schema);
// Validate query against schema
std::string query = "SELECT name, email FROM users WHERE active = true";
auto validation = executor.validate_query(query);
if (validation.is_valid) {
std::cout << "Query is valid!" << std::endl;
} else {
for (const auto& error : validation.errors) {
std::cout << "Error: " << error << std::endl;
}
}
// Analyze query structure
auto analysis = executor.analyze_query(query);
std::cout << "Query type: " << analysis.query_type << std::endl;
std::cout << "Modifies data: " << (analysis.modifies_data ? "Yes" : "No") << std::endl;
// Get optimization suggestions
auto suggestions = executor.suggest_indexes(query);
for (const auto& suggestion : suggestions) {
std::cout << "Suggested index: " << suggestion << std::endl;
}
#include "query_planner.hpp"
auto schema = std::make_shared<db25::DatabaseSchema>(db25::create_simple_schema());
db25::QueryPlanner planner(schema);
// Configure table statistics for better cost estimation
db25::TableStats user_stats;
user_stats.row_count = 10000;
user_stats.avg_row_size = 120.0;
user_stats.column_selectivity["email"] = 0.8;
planner.set_table_stats("users", user_stats);
// Generate logical plan
std::string query = "SELECT u.name, p.name FROM users u JOIN products p ON u.id = p.id WHERE u.name LIKE 'John%'";
auto plan = planner.create_plan(query);
// Display execution plan
std::cout << plan.to_string() << std::endl;
// Optimize the plan
auto optimized_plan = planner.optimize_plan(plan);
std::cout << "Optimized Plan:" << std::endl;
std::cout << optimized_plan.to_string() << std::endl;
// Generate alternative plans
auto alternatives = planner.generate_alternative_plans(query);
for (size_t i = 0; i < alternatives.size(); ++i) {
std::cout << "Alternative " << i << ":" << std::endl;
std::cout << alternatives[i].to_string() << std::endl;
}
// Configure planner options
db25::PlannerConfig config;
config.enable_hash_joins = true;
config.enable_merge_joins = true;
config.work_mem = 1024 * 1024; // 1MB
planner.set_config(config);
#include "physical_planner.hpp"
#include "execution_engine.hpp"
auto schema = std::make_shared<db25::DatabaseSchema>(db25::create_simple_schema());
db25::QueryPlanner logical_planner(schema);
db25::PhysicalPlanner physical_planner(schema);
// Create logical plan
std::string query = "SELECT u.name, p.price FROM users u JOIN products p ON u.id = p.user_id WHERE p.price > 100";
auto logical_plan = logical_planner.create_plan(query);
// Convert to physical plan
auto physical_plan = physical_planner.create_physical_plan(logical_plan);
// Display physical execution plan
std::cout << "Physical Execution Plan:" << std::endl;
std::cout << physical_plan.to_string() << std::endl;
// Execute the plan
db25::ExecutionEngine executor(schema);
auto execution_context = executor.create_context();
// Execute and get results
auto result_iterator = executor.execute(physical_plan, execution_context);
while (auto batch = result_iterator.next()) {
for (const auto& row : batch->rows) {
for (const auto& value : row) {
std::cout << value << "\t";
}
std::cout << std::endl;
}
}
// Get execution statistics
auto stats = execution_context.get_statistics();
std::cout << "Execution Statistics:" << std::endl;
std::cout << "Rows processed: " << stats.rows_processed << std::endl;
std::cout << "Memory used: " << stats.memory_used_mb << " MB" << std::endl;
std::cout << "Execution time: " << stats.execution_time_ms << " ms" << std::endl;
#include "htap_engine.hpp"
// Create HTAP engine with workload classification
db25::HTAPEngine htap_engine(schema);
// Configure for mixed workloads
db25::HTAPConfig config;
config.oltp_priority = 0.7; // Prioritize transactional workloads
config.analytical_timeout_ms = 5000;
config.enable_real_time_analytics = true;
htap_engine.set_config(config);
// Execute transactional query (high priority)
std::string oltp_query = "INSERT INTO orders (user_id, total) VALUES (123, 99.99)";
auto oltp_result = htap_engine.execute_transactional(oltp_query);
// Execute analytical query (processed in parallel)
std::string olap_query = "SELECT category, AVG(price) FROM products GROUP BY category";
auto olap_result = htap_engine.execute_analytical(olap_query);
// Real-time analytics on live data
auto analytics_stream = htap_engine.create_analytics_stream("orders");
analytics_stream.add_aggregation("total_revenue", "SUM(total)");
analytics_stream.add_aggregation("order_count", "COUNT(*)");
auto live_metrics = analytics_stream.get_current_metrics();
The library includes a complete e-commerce database schema with the following tables:
- users: User accounts and profiles
- categories: Product categories
- products: Product catalog with JSON attributes
- orders: Customer orders
- order_items: Items within orders
- shopping_cart: User shopping cart items
- reviews: Product reviews and ratings
- addresses: User addresses for shipping and billing
- payments: Order payment records
- inventory: Product inventory tracking
Each table includes appropriate:
- Primary keys (UUID or auto-increment)
- Foreign key relationships
- Indexes for performance
- Constraints and defaults
- JSON fields for flexible data
QueryResult parse(const std::string& query)
: Parse SQL queryNormalizedQuery normalize(const std::string& query)
: Normalize querystd::optional<std::string> get_query_fingerprint(const std::string& query)
: Get query fingerprintbool is_valid_sql(const std::string& query)
: Check if SQL is valid
void add_table(const Table& table)
: Add table to schemavoid add_index(const std::string& table_name, const Index& index)
: Add indexvoid add_foreign_key(...)
: Add foreign key relationshipstd::string generate_create_sql()
: Generate CREATE SQLstd::vector<std::string> get_table_names()
: Get all table namesstd::optional<Table> get_table(const std::string& name)
: Get table by name
QueryValidationResult validate_query(const std::string& query)
: Validate queryQueryAnalysis analyze_query(const std::string& query)
: Analyze query structurestd::string optimize_query(const std::string& query)
: Get optimized querystd::vector<std::string> suggest_indexes(const std::string& query)
: Get index suggestionsbool check_table_exists(const std::string& table_name)
: Check table existencebool check_column_exists(...)
: Check column existence
LogicalPlan create_plan(const std::string& query)
: Generate logical execution planLogicalPlan optimize_plan(const LogicalPlan& plan)
: Apply optimization rulesstd::vector<LogicalPlan> generate_alternative_plans(const std::string& query)
: Generate plan alternativesvoid estimate_costs(LogicalPlanNodePtr node)
: Estimate execution costsvoid set_table_stats(const std::string& table_name, const TableStats& stats)
: Set table statisticsvoid set_config(const PlannerConfig& config)
: Configure planner options
std::string to_string()
: Display plan in PostgreSQL-style formatvoid calculate_costs()
: Calculate total plan costLogicalPlan copy()
: Create deep copy of plan
- TableScanNode: Sequential table scan
- IndexScanNode: Index-based scan with conditions
- NestedLoopJoinNode: Nested loop join algorithm
- HashJoinNode: Hash join algorithm with build/probe phases
- ProjectionNode: Column projection and expression evaluation
- SelectionNode: Filter conditions (WHERE clause)
- AggregationNode: GROUP BY and aggregate functions
- SortNode: ORDER BY sorting
- LimitNode: LIMIT and OFFSET operations
- InsertNode: INSERT operations
- UpdateNode: UPDATE operations
- DeleteNode: DELETE operations
PhysicalPlan create_physical_plan(const LogicalPlan& logical_plan)
: Convert logical to physical planvoid set_execution_config(const ExecutionConfig& config)
: Configure execution parametersstd::vector<PhysicalPlan> generate_alternatives(const LogicalPlan& plan)
: Generate alternative physical plansvoid estimate_physical_costs(PhysicalPlanNodePtr node)
: Estimate physical execution costs
ExecutionContext create_context()
: Create execution contextResultIterator execute(const PhysicalPlan& plan, ExecutionContext& context)
: Execute physical planExecutionStatistics get_statistics(const ExecutionContext& context)
: Get execution statisticsvoid set_memory_limit(size_t memory_mb)
: Set memory limits for executionvoid enable_parallel_execution(int worker_threads)
: Configure parallel execution
- PhysicalSequentialScanNode: Table scan with filter pushdown
- PhysicalIndexScanNode: B+ tree index scan
- PhysicalNestedLoopJoinNode: Iterator-based nested loop join
- PhysicalHashJoinNode: Hash join with build/probe phases
- PhysicalSortNode: In-memory and external sort
- PhysicalAggregationNode: Hash-based GROUP BY aggregation
- PhysicalParallelScanNode: Multi-threaded table scan
- PhysicalMaterializeNode: Materialize intermediate results
- HTAPEngine: Main HTAP query processing engine
- WorkloadClassifier: Automatic OLTP/OLAP query classification
- ResourceManager: Dynamic resource allocation between workloads
- ComputationalStorageInterface: Near-data processing integration
The project includes three comprehensive example programs:
# Build and run basic example
make example
./build/example
The basic example demonstrates:
- SQL query parsing and validation
- Database schema generation
- Query validation against schema
- Schema inspection capabilities
- Query optimization suggestions
# Build and run planning demo
make planning_demo
./build/planning_demo
The planning demo demonstrates:
- Basic Logical Planning: Generate execution plans for simple queries
- Plan Optimization: Apply predicate pushdown and other optimization rules
- Alternative Plans: Generate multiple execution strategies
- DML Planning: Plan INSERT, UPDATE, and DELETE operations
- Cost Estimation: Analyze costs for different table sizes and join algorithms
- Join Algorithm Selection: Compare nested loop vs hash joins
# Build and run execution demo
make execution_demo
./build/execution_demo
The execution demo demonstrates:
- Physical Plan Generation: Convert logical plans to executable physical plans
- Iterator-Based Execution: Volcano-style execution model with physical operators
- Vectorized Processing: Batch-based tuple processing for analytical workloads
- Parallel Execution: Multi-threaded query execution with worker coordination
- Memory Management: Work memory limits and spill-to-disk strategies
- HTAP Workload Processing: Mixed transactional and analytical query execution
- Performance Monitoring: Execution statistics and resource utilization tracking
# Build and run all examples
make test
To install the library system-wide:
make install
This will copy the static library to /usr/local/lib/
and headers to /usr/local/include/
.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License. See LICENSE file for details.
- Ensure libpg_query is properly installed
- Check that pkg-config can find libpg_query:
pkg-config --cflags --libs libpg_query
- You may need to update your
PKG_CONFIG_PATH
- Ensure you're using a C++17 compatible compiler
- Check that all required headers are included
- Verify libpg_query development headers are installed
- Ensure libpg_query shared library is in your library path
- You may need to run
ldconfig
after installing libpg_query