- Required:
python3
, andstatistics
andscreen
packages. - Each script includes instructions how to run with input arguments. Each script can be run using true and estimated cardinalities. In addition, each script can be run with
screen
which its instructions are provided within each python script. - Installed PostgreSQL docker container. Follow the instructions from here.
- Installed PG_HINT.
- Create indexes. The list of indexes are provided, PK-indexes and FK-indexes.
- Part 1. Generate query plans
- Part 2. Execute query plans
- Part 3. Parsing runtime results
- Part 4. Generate Ensemble MST runtime
Script file: scripts_runtime/generate_physical_plans.py
Input file:
input_data/job/workload_queries
output_data/job/costs
Output file:input_data/job/PHYSICAL_PLANS/TRUE_PLANS
input_data/job/PHYSICAL_PLANS/EST_PLANS
Query plans, obtained from the enumeration algorithms, are generated in .sql files at input_data/job/PHYSICAL_PLANS/
. The SQL files can be generated from query plans selected when using true and estimated cardinalities. Their corresponding SQL files are generated in TRUE_PLANS
and EST_PLANS
. All queries with EXPLAIN ANALYZE
and their physical operators and join orders are forced using pg_hint.
Enumeration algorithms:
Exhaustive
GOO
Prim's
andPrim's
from each nodeKruskal's
andKruskal's
from each node
Script file: scripts_runtime/benchmark_psql_runtime.sh
Input file:
input_data/job/PHYSICAL_PLANS/TRUE_PLANS
input_data/job/PHYSICAL_PLANS/EST_PLANS
Output file:output_data/job/PHYSICAL_PLANS/TRUE_PLANS
output_data/job/PHYSICAL_PLANS/EST_PLANS
Create corresponding output folders. Each query is run 5 times
and median is considered in the results. Timing
and ph_hint
are enabled. The docker container name is mst_pg_docker
and database name is imdb
. The script can be run with screen
which its instructions are provided within the script.
Script file: scripts_runtime/parse_runtime.py
Input file:
-
input_data/job/workload_queries
-
output_data/job/PHYSICAL_PLANS/TRUE_PLANS
-
output_data/job/PHYSICAL_PLANS/EST_PLANS
Output file: -
output_data/job/runtime
-
The output of each enumeration algorithm is a csv file stored in here that includes, for each query, PostgreSQL optimization time, execution time, and total runtime. The script parses each query logs that run 5 times. Their corresponding log files are stored in
TRUE_PLANS
andEST_PLANS
.
Enumeration algorithms:
Exhaustive
GOO
Prim's
andPrim's
from each nodeKruskal's
andKruskal's
from each node
Script file: scripts_runtime/parse_ensemble_runtime.py
Input file:
-
input_data/job/workload_queries
-
output_data/job/costs/kruskal_ensemble_opt_plans.csv
-
output_data/job/costs/prim_ensemble_opt_plans.csv
-
output_data/job/costs/kruskal_ensemble_opt_plans_psql.csv
-
output_data/job/costs/prim_ensemble_opt_plans_psql.csv
-
output_data/job/runtime/enum_run_job_kruskal_ensemble.csv
-
output_data/job/runtime/enum_run_job_prim_ensemble.csv
-
output_data/job/runtime/enum_run_job_kruskal_ensemble_psql.csv
-
output_data/job/runtime/enum_run_job_prim_ensemble_psql.csv
Output file: -
output_data/job/runtime/enum_run_job_ensemble.csv
-
output_data/job/runtime/enum_run_job_ensemble_psql.csv
This script does not parse the workload queries logs. It uses the output generated in kruskal_ensemble
and prim_ensemble
. Ensemble MST chooses the plan with minimum cost and and its corresponding execution time of Prim's and Kruskal's run from each node. No need to run this script with screen
, it is quick.