To run Swift REPL in a docker container run:
docker run --rm -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined swift:5.10.1 swift repl
To run the project in a container, then:
make run CONFIG_FILE_PATH=config-files/base_config.json SIMULATOR_ARGS=[...]
If the environment already have Swift installed (e.g. when you are developing using VSCode devcontainer feature):
make run IS_DEVCONTAINER=true CONFIG_FILE_PATH=config-files/base_config.json SIMULATOR_ARGS=[...]
The number of simulations is determined by the execution parameters:
An execution with:
$ nodes = 5 \newline services = 6 \newline maxWindowSize = 4 \newline $
Includes the following number of samplings:
$ winSize = 1 \to samplings = (6^{1}) * 5 + 5\newline winSize = 2 \to samplings = (6^{2}) * 4 + 5\newline winSize = 3 \to samplings = (6^{3}) * 3 + 5\newline winSize = 4 \to samplings = (6^{4}) * 2 + 5\newline $
Datasets are located in the datasets
folder. The following table describes the characteristics of each dataset:
Dataset | Average of Columns entropy | Variance of Columns entropy | Std Dev of Columns entropy |
---|---|---|---|
high_variability | 11.80 | 0.24 | 0.49 |
low_variability | 1.7 | 0.2 | 0.45 |
inmates_enriched_10k | 5.35 | 13.09 | 3.62 |
IBM_HR_Analytics_employee_attrition | 3.13 | 8.56 | 2.93 |
red_wine_quality | 5.61 | 2.01 | 1.42 |
avocado | 9.36 | 22.13 | 4.7 |
To compute the entropy of each column:
import pandas as pd
import numpy as np
from typing import Dict
dataset = pd.read_csv(dataset_name + ".csv")
dataset_size = len(dataset)
def get_column_frequency(column: pd.Series) -> pd.Series:
return column.value_counts()
def get_column_probability(column: pd.Series) -> pd.Series:
return column.value_counts(normalize=True)
def get_column_entropy(column: pd.Series) -> float:
column_probability = get_column_probability(column)
return -sum(column_probability * np.log2(column_probability))
entropies = [get_column_entropy(dataset[column]) for column in dataset.columns ]
print(f"{round(np.mean(entropies), 2)}, {round(np.var(entropies), 2)}, {round(np.std(entropies), 2)}")
To set the logger level, create an env variable called LOGGER_LEVEL
with one of the following values: trace, debug, info, notice, warning, error, critical
( default is info
). The alternative is to pass this variable to make run
.
For DB migration, run make migrate-db SQL_CODE="your_migration_sql"
.
To run queries on DB, run make run-query SQL_CODE="your_plain_sql"
.
Inside the k8s/
folder, there are all the resources to run a simulation on k8s. After cd k8s
, here are some Makefile recipees:
install
: deploy the setup resourcesuninstall
: uninstall the setup resourcesrun-simulation
: run a simulation. Example:make run-simulation NAME=test-sim VALUES_FILE=./run-simulation/files/base-params.yaml`
delete-simulation
: uninstall the resources created for the simulationcopy-dataset
: copy a dataset into the volume used by running a simulation. Example:make copy-dataset FILE_PATH=path/to/dataset.csv
query-db
: open a sqlite connection with the db specified (defaults tosimulations.db
). Example:make query-db DB_PATH=simulations.db