Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking #96

Merged
merged 34 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
4cc7d38
Add first version of benchmarking script
Bronzila Apr 15, 2024
9ce8d23
[WIP] Add initial HPOBench benchmark
Bronzila Apr 17, 2024
a876520
Add OD and surrogate benchmark
Bronzila Apr 17, 2024
9469123
[WIP] Add proper min/max fidelity loading
Bronzila Apr 17, 2024
7f7778f
Adjust args
Bronzila Apr 18, 2024
0100e3a
Add benchmarking setup script
Bronzila Apr 23, 2024
5253f3e
Sort out dependencies and clean up benchmarking script
Bronzila Apr 29, 2024
23c3274
Adjust folder structure and implement mfpbench benchmark
Bronzila May 7, 2024
3f3c67b
Add setup guide
Bronzila May 8, 2024
4097ab4
Add virtual env creation section to benchmarking script
Bronzila May 10, 2024
f0ef67f
Adjust benchmarking setup
Bronzila May 13, 2024
59ffbc3
Fix version for MFPbench and commit for HPOBench
Bronzila May 15, 2024
2b08333
Merge branch 'benchmarking' of https://github.com/automl/DEHB into be…
Bronzila May 15, 2024
3b9ab0b
Integrate more MFPBench benchmarks + [WIP] MFPBench benchmarking setup
Bronzila May 21, 2024
11662aa
Add PD1, MFH benchmarks and result table/plot generation
Bronzila May 22, 2024
f240ef0
bump up mfpbench version
Bronzila May 23, 2024
ef88faa
Make summary generation parametrizable
Bronzila May 28, 2024
8e510eb
Adjust seeding and add CountingOnes benchmark
Bronzila May 30, 2024
c00946b
Add version sorting for result table
Bronzila May 31, 2024
6ece84f
Add benchmarking script for cluster
Bronzila Jun 5, 2024
21d8633
Adjust script to fit benchmarking doc names
Bronzila Jun 5, 2024
83a7896
Minor adjustments
Bronzila Jun 5, 2024
ec64e5e
Add benchmarking files to gitignore and adjust benchmarking script
Bronzila Jun 5, 2024
69cdc88
Remove jahs from benchmarking script
Bronzila Jun 18, 2024
e1f38f5
Remove JAHS from default choice for MFPBench
Bronzila Jun 20, 2024
cba4eee
Add extra picky dependenciesfor hpobench setup
Bronzila Jul 3, 2024
0a7f79f
Refactor HPOBench benchmark to use tabular ml benchmarks
Bronzila Jul 3, 2024
957410e
Remove unnecessary cd command
Bronzila Jul 3, 2024
39afd7f
Add PR template and benchmark instructions for PRs to CONTRIBUTING
Bronzila Jul 3, 2024
6f49cd0
Update benchmarking markdown
Bronzila Jul 3, 2024
2c23bf3
Run all benchmarks per default
Bronzila Jul 3, 2024
1bee014
Adjust benchmark scripts to final version
Bronzila Jul 3, 2024
bfadd8b
Add benchmark results for all dehb versions
Bronzila Jul 3, 2024
7dc6912
Merge branch 'benchmarking' of https://github.com/automl/DEHB into be…
Bronzila Jul 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
## Pull Request Checklist

Thank you for your contribution! Before submitting this PR, please make sure you have completed the following steps:

### 1. Unit Tests / Normal PR Workflow

- [ ] Ensure all existing unit tests pass.
- [ ] Add new unit tests to cover the changes.
- [ ] Verify that your code follows the project's coding standards.
- [ ] Add documentation for your code if necessary.
- [ ] Check below, if your changes require you to run benchmarks.

#### When Do I Need To Run Benchmarks?

Depending on your changes, we ask you to run some benchmarks:

1. Style changes.

If your changes only consist of style modifications, such as renaming or adding docstrings, and do not interfere with DEHB's interface, functionality, or algorithm, it is sufficient for all test cases to pass.

2. Changes to DEHB's interface and functionality or the algorithm itself.

If your changes affect the interface, functionality, or algorithm of DEHB, please also run the synthetic benchmarks (MFH3, MFH6 of MFPBench, and the CountingOnes benchmark). This will help determine whether any changes introduced bugs or significantly altered DEHB's performance. However, at the reviewer's discretion, you may also be asked to run your changes on real-world benchmarks if deemed necessary. For instructions on how to install and run the benchmarks, please have a look at our [benchmarking instructions](../benchmarking/BENCHMARKING.md). Please use the same budget for your benchmark runs as we specified in the instructions.
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,17 @@ __pycache__/
*/*/__pycache__/


# folders as artefacts
# folders as artifacts
.idea/
results/
plots/
workflow/
dask-worker-space/
dehb/examples/*/results
dehb/examples/*/*/results
.ipynb_checkpoints/
data/
logs/
*.err


# automl_template .gitignore
Expand Down
10 changes: 10 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,16 @@ When submitting a pull request, please ensure the following:
- Ensure your code follows the project's code style and guidelines.
- Be responsive to any feedback or questions during the review process.

Additonally, we ask you to run specific benchmarks, depending on the depth of your changes:

1. Style changes.

If your changes only consist of style modifications, such as renaming or adding docstrings, and do not interfere with DEHB's interface, functionality, or algorithm, it is sufficient for all test cases to pass.

2. Changes to DEHB's interface and functionality or the algorithm itself.

If your changes affect the interface, functionality, or algorithm of DEHB, please also run the synthetic benchmarks (MFH3, MFH6 of MFPBench, and the CountingOnes benchmark). This will help determine whether any changes introduced bugs or significantly altered DEHB's performance. However, at the reviewer's discretion, you may also be asked to run your changes on real-world benchmarks if deemed necessary. For instructions on how to install and run the benchmarks, please have a look at our [benchmarking instructions](./benchmarking/BENCHMARKING.md). Please use the same budget for your benchmark runs as we specified in the instructions.

## Code Style and Guidelines

To maintain consistency and readability, we follow a set of code style and guidelines. Please make sure that your code adheres to these standards:
Expand Down
102 changes: 102 additions & 0 deletions benchmarking/BENCHMARKING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Benchmarking DEHB

Benchmarking DEHB is crucial for ensuring consistent performance across different setups and configurations. We aim to benchmark DEHB on multiple HPOBench-benchmarks and MFPBench-benchmarks with different run setups:

1. Using `dehb.run`,
2. Using the Ask & Tell interface and
3. Restarting the optimization run after half the budget.

In the end, the results for the 3 different execution setups should be the same. With this setup guide, we encourage the developers of DEHB to continually benchmark their changes in order to ensure, that

- the inner workings of DEHB are not corrupted by checking the different execution setup results and
- that overall performance either remains the same, if no algortihmic changes have been made or is still comparable/better, if algorithmic changes have been made.

Please follow the installtion guide below, to benchmark your changes.

## Installation Guide HPOBench

The following guide walks you through installing hpobench and running the benchmarking script. Here, we assume that you execute the commands in your cloned DEHB repository.

### Create Virtual Environment

Before starting, please make sure you have clean virtual environment using python 3.8 ready. The following commands walk you through on how to do this with conda.

```shell
conda create --name dehb_hpo python=3.8
conda activate dehb_hpo
```

### Installing HPOBench

```shell
git clone https://github.com/automl/HPOBench.git
cd HPOBench
git checkout 47bf141 # Checkout specific commit
pip install .[ml_tabular_benchmarks]
cd ..
```

### Installing DEHB

There are some additional dependencies needed for plotting and table generation, therefore please install DEHB with the benchmarking options:

```shell
pip install -e .[benchmarking,hpobench_benchmark]
```

### Running the Benchmarking Script

The benchmarking script is highly configurable and lets you choose between the budget types (`fevals`, `brackets` and `total_cost`), the execution setup (`run`(default), `ask_tell` and `restart`), the benchmarks used (`tab_nn`, `tab_rf`, `tab_svm`, `tab_lr`, `surrogate`, `nasbench201`) and the seeds used for each benchmark run (default: [0]).

```shell
python3.8 benchmarking/hpobench_benchmark.py --fevals 300 --benchmarks tab_nn tab_rf tab_svm tab_lr surrogate nasbench201 --seed 0 --n_seeds 5 --output_path logs/hpobench_benchmarking
```

## Installation Guide MFPBench

The following guide walks you trough instaling mfpbench and running the benchmarking script. Here, we assume that you execute the commands in your cloned DEHB repository.

## PD1 Benchmark and MFHartmann

### Create Virtual Environment

Before starting, please make sure you have clean virtual environment using python 3.8 ready. The following commands walk you through on how to do this with conda.

```shell
conda create --name dehb_pd1 python=3.8
conda activate dehb_pd1
```

### Installing DEHB with MFPBench

There are some additional dependencies needed for plotting and table generation, therefore please install DEHB with the benchmarking options:

```shell
pip install -e .[benchmarking,pd1_benchmark]
```

### Downloading Benchmark Data

In order to run the benchmark, first we need to download the benchmark data:

```shell
python -m mfpbench download --benchmark pd1
```

### Running the Benchmarking Script

We currently support and use the PD1 benchmarks `cifar100_wideresnet_2048`, `imagenet_resnet_512`, `lm1b_transformer_2048` and `translatewmt_xformer_64`. Moreover, the `mfh3` and `mfh6` benchmarks are available.

```shell
python3.8 benchmarking/mfpbench_benchmark.py --fevals 300 --benchmarks mfh3 mfh6 cifar100_wideresnet_2048 imagenet_resnet_512 lm1b_transformer_2048 translatewmt_xformer_64 mfh3 mfh6 --seed 0 --n_seeds 5 --output_path logs/pd1_benchmarks
```

## CountingOnes Benchmark

The CountingOnes benchmark is a synthetical benchmark and only depends on numpy, thus it can be used directly without any special setup.

### Running the Benchmarking Script

```shell
python benchmarking/countingones_benchmark.py --seed 0 --n_seeds 5 --fevals 300 --output_path logs/countingones --n_continuous 50 --n_categorical 50
```
50 changes: 50 additions & 0 deletions benchmarking/benchmarking.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/bin/bash
#SBATCH -p bosch_cpu-cascadelake
#SBATCH -o logs/%A[%a].%N.out # STDOUT (the folder log has to exist) %A will be replaced by the SLURM_ARRAY_JOB_ID value
#SBATCH -e logs/%A[%a].%N.err # STDERR (the folder log has to exist) %A will be replaced by the SLURM_ARRAY_JOB_ID value
#SBATCH -J DEHB_benchmarking # sets the job name.
#SBATCH -a 1-3 # array size
#SBATCH -t 0-00:30:00
#SBATCH --mem 16GB

BUDGET=300

# Print some information about the job to STDOUT
echo "Workingdir: $(pwd)";
echo "Started at $(date)";
echo "Benchmarking DEHB on multiple benchmarks";
echo "Running job $SLURM_JOB_NAME using $SLURM_JOB_CPUS_PER_NODE cpus per node with given JID $SLURM_JOB_ID on queue $SLURM_JOB_PARTITION";

source ~/.bashrc

if [ 1 -eq $SLURM_ARRAY_TASK_ID ]
then
conda activate dehb_pd1
pip install .

python benchmarking/mfpbench_benchmark.py --seed 0 --n_seeds 5 --fevals $BUDGET --benchmarks mfh3 mfh6 cifar100_wideresnet_2048 imagenet_resnet_512 lm1b_transformer_2048 --output_path logs/pd1
# Due to memory problems
python benchmarking/mfpbench_benchmark.py --seed 0 --n_seeds 5 --fevals $BUDGET --benchmarks translatewmt_xformer_64 --output_path logs/pd1

python benchmarking/generate_summary.py
elif [ 2 -eq $SLURM_ARRAY_TASK_ID ]
then
conda activate dehb_hpo
pip install .

python benchmarking/hpobench_benchmark.py --seed 0 --n_seeds 5 --fevals $BUDGET --benchmarks tab_nn tab_rf tab_svm tab_lr surrogate nasbench201 --output_path logs/hpob

python benchmarking/generate_summary.py
elif [ 3 -eq $SLURM_ARRAY_TASK_ID ]
then
sleep 60 # Wait for dehb_pd1 to install dehb properly
conda activate dehb_pd1 # CountingOnes works with any dependencies, since it is only dependent on numpy

python benchmarking/countingones_benchmark.py --seed 0 --n_seeds 5 --fevals $BUDGET --output_path logs/countingones --n_continuous 50 --n_categorical 50

python benchmarking/generate_summary.py
fi

# Print some Information about the end-time to STDOUT
echo "DONE";
echo "Finished at $(date)";
Loading
Loading