Skip to content

CheckMate: Evaluating Checkpointing Protocols for Streaming Dataflows

Notifications You must be signed in to change notification settings

delftdata/checkmate

Repository files navigation

CheckMate

This repository contains the code of the paper "CheckMate: Evaluating Checkpointing Protocols for Streaming Dataflows", ICDE 2024.

Preliminaries

This project requires an environment with python 3.11 installed. Please install the universalis-package and all the requirements of the coordinator and the worker modules as well as pandas, numpy and matplotlib.

You can use the following commands:

pip install universalis-package/.  
pip install -r coordinator/requirements.txt
pip install -r worker/requirements.txt
pip install pandas numpy matplotlib

Running experiments

In the scripts directory, we provide a number of different scripts that can be used to run the experiments of CheckMate. The easiest way is to create a csv file formatted as follows:

# experiment_name,query,protocol,checkpoint_interval,num_of_workers,input_rate,failure,hot_item_ratio
example-q1-unc,q1,UNC,5,4,4000,true,0.0
example-q1-cor,q1,COR,5,4,4000,true,0.0

The csv file should not include the header and a newline is required after the last line of configuration.

Each parameter can take the following values:

Parameter Values
experiment_name Any name allowed by your OS. It will be used to create a folder
where all the results of the experiment will be stored, as well as
prefixing the created files.
query q1, q3, q8-running, q12-running, cyclic
protocol NOC, UNC, COR, CIC
checkpoint interval Any value > 0
num_of_workers Any integer > 0. Every worker requires 2 cpus.
input_rate* Any integer > 0.
failure true / false
hot_item_ratio 0 $\leq$ x $\leq$ 1
(Applicable only in NexMark queries.)

* In the case of the cyclic query, the generator uses 3 threads, so the value should be the 1/3 of the desired total input rate.

We provide a csv file containing a sample of exemplary configurations. A csv file containing all the used configurations in our experiments will follow.

Using either the provided or your own csv files, you can run the experiments using the following script from the root of the repository:
./scripts/run_batch_experiments.sh location_of_the_csv_file directory_to_save_results

Note: To run experiments with NexMark queries, you must first build the generator. To build the generator, run mvn clean package from nexmark directory. Java 11 and maven are required.

Alternative way of execution

Alternatively, you can also handle the individual components of the pipeline as follows. First, you need to deploy the Kafka cluster and the MinIO storage.

Kafka

To run kafka: docker compose -f docker-compose-kafka.yml up

To clear kafka: docker compose -f docker-compose-kafka.yml down --volumes


MinIO

To run MinIO: docker compose -f docker-compose-simple-minio.yml up

To clear MinIO: docker compose -f docker-compose-simple-minio.yml down --volumes


Then, you can start the stream processing engine and specify the desired scale.

Stateflow Engine

To run the SPE: docker compose up --build --scale worker=4

To clear the SPE: docker compose down --volumes

About

CheckMate: Evaluating Checkpointing Protocols for Streaming Dataflows

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages