LIMAO: A Framework for Lifelong Modular Learned Query Optimization

This repository contains the implementation of LIMAO on Balsa, a Lifelong Modular Reinforcement Learning framework designed for database query optimization in dynamic environments. The codebase currently supports experiments on IMDB and TPC-H workloads, but you are welcome to extend it to additional workloads. Due to time constraints, the repository might still have some inconsistencies. Feel free to reach out to the authors for assistance!

Introduction

LIMAO integrates reinforcement learning to optimize database queries dynamically under workload, data volume, and both workload with data volume switching scenarios. This repository enables users to reproduce the experiments conducted in the associated paper.

System Requirements

Operating System: Linux (CloudLab c240g5 machine recommended, if you use your own mahcines, please make some customizations to the setup)
Python: 3.8
PostgreSQL: 12.5
Anaconda: Anaconda3-2023.09

Note: Additional configurations might be needed for using a different system.

Setup

1. CloudLab Setup (Recommended)

Prepare the storage environment:

sudo /usr/local/etc/emulab/mkextrafs.pl -f /mydata
sudo chmod 777 /mydata

Install Anaconda:

cd /mydata
curl -O https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh
bash ./Anaconda3-2023.09-0-Linux-x86_64.sh
rm ./Anaconda3-2023.09-0-Linux-x86_64.sh

Clone the repository:

git clone [email protected]:Tsihan/LIMAOLifeLongRLDB.git -b final_switching_workload

Note: This branch aims to replay the result in workload switching situation using LIMAO, if you want to see other dynamic situations, you may use other branches like final_switching_dbvolume, final_switching_workload_dbvolume, .etc. The prefix of each branch like "final", "original", "treedecompose" reflect different variants of the prototype evolving from Balsa to the final form of LIMAO implemented in Balsa.

Set up the Python environment:

conda create -n limao python=3.8 -y
conda activate limao
pip install -e .
pip install -e pg_executor
pip install -r requirements.txt

Install PostgreSQL 12.5:

wget https://ftp.postgresql.org/pub/source/v12.5/postgresql-12.5.tar.gz
tar xzvf postgresql-12.5.tar.gz
cd postgresql-12.5
./configure --prefix=/mydata/postgresql-12.5 --without-readline
sudo make -j
sudo make install
echo 'export PATH=/mydata/postgresql-12.5/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

Install pg_hint_plan extension:

git clone https://github.com/ossc-db/pg_hint_plan.git -b REL12_1_3_7
cd pg_hint_plan
# Modify Makefile: Set PG_CONFIG to /mydata/postgresql-12.5/bin/pg_config
make
sudo make install

2. Prepare Datasets

Download IMDB dataset:

mkdir -p /mydata/datasets/job && cd /mydata/datasets/job
wget -c http://homepages.cwi.nl/~boncz/job/imdb.tgz
tar -xvzf imdb.tgz

Prepend headers to CSV files:

Activate the LIMAO environment:
```
conda activate limao
```

Run the script:

python3 /mydata/LIMAOLifeLongRLDB/scripts/prepend_imdb_headers.py

Modify the script to set the correct directory:

flags.DEFINE_string('csv_dir', '/mydata/datasets/job', 'Directory to IMDB CSVs.')

Initialize and start PostgreSQL:

pg_ctl -D /mydata/databases initdb
cp /mydata/LIMAOLifeLongRLDB/conf/balsa-postgresql.conf /mydata/databases/postgresql.conf
pg_ctl -D /mydata/databases start -l logfile

Load IMDB dataset:

bash load-postgres/load_job_postgres.sh /mydata/datasets/job

Note: For setting up the TPC-H workload, please refer to the following Google Drive link.

Usage

1. Run Experiments on IMDB Workloads

IMDB Set 1:

python3 run.py --run Balsa_JOBRandSplit_IMDB_assorted_3 --local

IMDB Set 2:

python3 run.py --run Balsa_JOBRandSplit_IMDB_assorted_4 --local

Switching Workload:

python3 run.py --run Balsa_JOBRandSplit_IMDB_assorted_Replay_2 --local

2. Run Experiments on TPCH Workloads

TPCH Set 1:

python3 run.py --run Balsa_JOBRandSplit_TPCH10_assorted --local

TPCH Set 2:

python3 run.py --run Balsa_JOBRandSplit_TPCH10_assorted_2 --local

Switching Workload:

python3 run.py --run Balsa_JOBRandSplit_TPCH10_assorted_Replay --local

Customization

Modify parameters in the following scripts as needed:
- balsa/optimizer.py (class initialization)
- balsa/util/postgres.py (PostgreSQL connection)
- run.py:
  - _MakeWorkload(self, is_origin=False)
  - RunBaseline(self)
- sim.py:
  - Params(cls)
  - _SimulationDataPath(self)
  - _FeaturizedDataPath(self)
Update timezones in balsa-postgresql.conf if required.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Note: If any steps are unclear or additional details are required, please feel free to raise an issue in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
balsa		balsa
checkpoints		checkpoints
conf		conf
load-postgres		load-postgres
pg_executor		pg_executor
queries		queries
scripts		scripts
.gitignore		.gitignore
.style.yapf		.style.yapf
LICENSE		LICENSE
README.md		README.md
experiments.py		experiments.py
format.sh		format.sh
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py
setup.sh		setup.sh
sim.py		sim.py
train_utils.py		train_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LIMAO: A Framework for Lifelong Modular Learned Query Optimization

Table of Contents

Introduction

System Requirements

Setup

1. CloudLab Setup (Recommended)

2. Prepare Datasets

Usage

1. Run Experiments on IMDB Workloads

2. Run Experiments on TPCH Workloads

Customization

License

About

Releases

Packages

Languages

License

Tsihan/LIMAOLifeLongRLDB

Folders and files

Latest commit

History

Repository files navigation

LIMAO: A Framework for Lifelong Modular Learned Query Optimization

Table of Contents

Introduction

System Requirements

Setup

1. CloudLab Setup (Recommended)

2. Prepare Datasets

Usage

1. Run Experiments on IMDB Workloads

2. Run Experiments on TPCH Workloads

Customization

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages