A Systematic Study on Early Stopping Criteria in HPO and the Implication of Uncertainty [EAB]

This paper delves into the pivotal role that early stopping criteria play in hyperparameter optimization (HPO). We introduce criteria that incorporate uncertainty and substantiate their practical significance through empirical experimentation. By shedding light on the influence of criterion selection, this research offers valuable insights for enhancing the effectiveness of HPO.

All the code used for empirical experiments, significance tests, and visualization, as discussed in the paper, is available in this repository for validation and replication.

Installation

Benchmark Setup

Nas-Bench-201 and HPOBench

We make use of the high-level APIs defined in HPOBench to access Nas-Bench-201. Here, we provide straightforward instructions on how to install it:
```
git clone https://github.com/automl/HPOBench.git
cd HPOBench
pip install .
```

LCBench

git clone https://github.com/automl/LCBench.git
cd LCBench
cat requirements.txt | xargs -n 1 -L 1 pip install

Obtaining Results

The code for running the Hyperband algorithm on each benchmark to assess different criteria is available in separate scripts within the directory, named LCBench_{dataset}.py and NB201_{dataset}.py. The criteria for comparison are enumerated in the utils.py file, and the --task argument instructs the script on which criteria to evaluate. If no arguments are specified, the script defaults to comparing the commonly used criteria, i.e. training loss/accuracy and validation loss/accuracy. Each test runs 1000 repetitions by default, and the results are stored in the Records/ directory.

Run the tests as follows:

python NB201_ImageNet.py \
        --task acc_loss \               # Indicate which set of criteria to compare
        --rounds 1000 \                 # The number of repetitions for each comparison
        --max_iters 50 81 \  # The budget constraint R for Hyperband
        --eta 3                         # The filtering ratio for Hyperband

Significance Testing

To compare the outcomes of each criterion, we employ the Wilcoxon signed-rank test to determine the presence of significant differences among the criteria. We also keep track of the number of wins, ties, and losses of each criterion across the 1000 repetitions. The scripts for these tests are located in the Tests/ directory.

You can reproduce the comparison results by running:

cd Tests/
python wilcoxon.py \                    # Wilcoxon signed-rank test
        --task acc_loss \               # Indicate which set of criteria to compare
        --dataset ImageNet \            # Benchmark name
        --max_iters 50 81 \  # The budget constraint R for Hyperband
        --eta 3                         # The filtering ratio for Hyperband

python wtl.py \                         # Count the wins/ties/losses
        --task acc_loss \               
        --dataset ImageNet \            
        --max_iters 50 81\  
        --eta 3

Visualization

Scripts for reproducing the figures presented in the paper are provided in the zFigures/ directory. Simply execute the scripts to obtain visualization results.

Copyright: The copyright of this repository belongs to the authors of the VLDB'2025 paper submission (#345). The purpose of this package is only for the assessment by the VLDB'2025 program committee during the paper review process; any other uses for any other purposes are prohibited.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
DataLoader		DataLoader
Schemes		Schemes
StageTrack		StageTrack
Tests		Tests
zFigures		zFigures
LCBench_adult.py		LCBench_adult.py
LCBench_higgs.py		LCBench_higgs.py
LCBench_jasmine.py		LCBench_jasmine.py
LCBench_mnist.py		LCBench_mnist.py
LCBench_vehicle.py		LCBench_vehicle.py
LCBench_volkert.py		LCBench_volkert.py
NB201_Cifar10.py		NB201_Cifar10.py
NB201_Cifar100.py		NB201_Cifar100.py
NB201_ImageNet.py		NB201_ImageNet.py
NN.py		NN.py
Pybnn.py		Pybnn.py
README.md		README.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Systematic Study on Early Stopping Criteria in HPO and the Implication of Uncertainty [EAB]

Installation

Benchmark Setup

Obtaining Results

Significance Testing

Visualization

About

Uh oh!

Releases

Packages

Languages

Guan-JW/Metric_Study

Folders and files

Latest commit

History

Repository files navigation

A Systematic Study on Early Stopping Criteria in HPO and the Implication of Uncertainty [EAB]

Installation

Benchmark Setup

Obtaining Results

Significance Testing

Visualization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages