Skip to content

Latest commit

 

History

History
82 lines (65 loc) · 4.24 KB

MST_runtime_README.md

File metadata and controls

82 lines (65 loc) · 4.24 KB

Instructions to execute query plans selected by the enumeration algorithms.

  • Required: python3, and statistics and screen packages.
  • Each script includes instructions how to run with input arguments. Each script can be run using true and estimated cardinalities. In addition, each script can be run with screen which its instructions are provided within each python script.
  • Installed PostgreSQL docker container. Follow the instructions from here.
  • Installed PG_HINT.
  • Create indexes. The list of indexes are provided, PK-indexes and FK-indexes.

Table of Contents

  1. Part 1. Generate query plans
  2. Part 2. Execute query plans
  3. Part 3. Parsing runtime results
  4. Part 4. Generate Ensemble MST runtime

1. Generate query plans

Script file: scripts_runtime/generate_physical_plans.py Input file:

  • input_data/job/workload_queries
  • output_data/job/costs Output file:
  • input_data/job/PHYSICAL_PLANS/TRUE_PLANS
  • input_data/job/PHYSICAL_PLANS/EST_PLANS

Query plans, obtained from the enumeration algorithms, are generated in .sql files at input_data/job/PHYSICAL_PLANS/. The SQL files can be generated from query plans selected when using true and estimated cardinalities. Their corresponding SQL files are generated in TRUE_PLANS and EST_PLANS. All queries with EXPLAIN ANALYZE and their physical operators and join orders are forced using pg_hint.

Enumeration algorithms:

  • Exhaustive
  • GOO
  • Prim's and Prim's from each node
  • Kruskal's and Kruskal's from each node

2. Execute query plans

Script file: scripts_runtime/benchmark_psql_runtime.sh Input file:

  • input_data/job/PHYSICAL_PLANS/TRUE_PLANS
  • input_data/job/PHYSICAL_PLANS/EST_PLANS Output file:
  • output_data/job/PHYSICAL_PLANS/TRUE_PLANS
  • output_data/job/PHYSICAL_PLANS/EST_PLANS

Create corresponding output folders. Each query is run 5 times and median is considered in the results. Timing and ph_hint are enabled. The docker container name is mst_pg_docker and database name is imdb. The script can be run with screen which its instructions are provided within the script.

3. Parsing runtime results

Script file: scripts_runtime/parse_runtime.py Input file:

  • input_data/job/workload_queries

  • output_data/job/PHYSICAL_PLANS/TRUE_PLANS

  • output_data/job/PHYSICAL_PLANS/EST_PLANS Output file:

  • output_data/job/runtime

  • The output of each enumeration algorithm is a csv file stored in here that includes, for each query, PostgreSQL optimization time, execution time, and total runtime. The script parses each query logs that run 5 times. Their corresponding log files are stored in TRUE_PLANS and EST_PLANS.

Enumeration algorithms:

  • Exhaustive
  • GOO
  • Prim's and Prim's from each node
  • Kruskal's and Kruskal's from each node

4. Generate Ensemble MST runtime

Script file: scripts_runtime/parse_ensemble_runtime.py Input file:

  • input_data/job/workload_queries

  • output_data/job/costs/kruskal_ensemble_opt_plans.csv

  • output_data/job/costs/prim_ensemble_opt_plans.csv

  • output_data/job/costs/kruskal_ensemble_opt_plans_psql.csv

  • output_data/job/costs/prim_ensemble_opt_plans_psql.csv

  • output_data/job/runtime/enum_run_job_kruskal_ensemble.csv

  • output_data/job/runtime/enum_run_job_prim_ensemble.csv

  • output_data/job/runtime/enum_run_job_kruskal_ensemble_psql.csv

  • output_data/job/runtime/enum_run_job_prim_ensemble_psql.csv Output file:

  • output_data/job/runtime/enum_run_job_ensemble.csv

  • output_data/job/runtime/enum_run_job_ensemble_psql.csv

This script does not parse the workload queries logs. It uses the output generated in kruskal_ensemble and prim_ensemble. Ensemble MST chooses the plan with minimum cost and and its corresponding execution time of Prim's and Kruskal's run from each node. No need to run this script with screen, it is quick.