Merge pull request #95 from automl/feedback

Add feedback from multiple sources to DEHB
automl · Jul 3, 2024 · 2fc9510 · 2fc9510
2 parents 04cabed + e0433b2
commit 2fc9510
Show file tree

Hide file tree

Showing 21 changed files with 918 additions and 427 deletions.
diff --git a/.github/workflows/citation_cff.yml b/.github/workflows/citation_cff.yml
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -10,6 +10,7 @@ Thank you for considering contributing to DEHB! We welcome contributions from th
 - [Code Contributions](#code-contributions)
 - [Submitting a Pull Request](#submitting-a-pull-request)
 - [Code Style and Guidelines](#code-style-and-guidelines)
+- [Documentation](#documentation)
 - [Community Guidelines](#community-guidelines)
 
 ## How to Contribute
@@ -78,6 +79,66 @@ To maintain consistency and readability, we follow a set of code style and guide
 - Write comprehensive and meaningful commit messages.
 - Write unit tests for new features and ensure existing tests pass.
 
+## Documentation
+Proper documentation is crucial for the maintainability and usability of the DEHB project. Here are the guidelines for documenting your code:
+
+### General Guidelines
+
+- **New Features:** All new features must include documentation.
+- **Docstrings:** All public functions must include docstrings that follow the [Google style guide](https://google.github.io/styleguide/pyguide.html).
+- **Comments:** Use comments to explain the logic behind complex code, special cases, or non-obvious implementations.
+- **Clarity:** Ensure that your comments and docstrings are clear, concise, and informative.
+
+### Docstring Requirements
+
+For each public function, the docstring should include:
+
+1. **Summary:** A brief description of the function's purpose.
+2. **Parameters:** A list of all parameters with descriptions, including types and any default values.
+3. **Returns:** A description of the return values, including types.
+4. **Raises:** A list of any exceptions that the function might raise.
+
+### Example Docstring
+
+```python
+def example_function(param1: int, param2: str = "default") -> bool:
+    """
+    This is an example function that demonstrates how to write a proper docstring.
+
+    Args:
+        param1 (int): The first parameter, an integer.
+        param2 (str, optional): The second parameter, a string. Defaults to "default".
+
+    Returns:
+        bool: The return value. True if successful, False otherwise.
+
+    Raises:
+        ValueError: If `param1` is negative.
+    """
+    if param1 < 0:
+        raise ValueError("param1 must be non-negative")
+    return True
+```
+
+### Rendering Documentation Locally
+
+To render the documentation locally for debugging and review:
+
+1. Install the required `dev` dependencies:
+
+    ```bash
+    pip install -e .[dev]
+    ```
+
+2. Use `mike` to deploy and serve the documentation locally:
+
+    ```bash
+    mike deploy --update-aliases 2.0.0 latest --ignore
+    mike serve
+    ```
+
+3. The docs should now be viewable on http://localhost:8000/. If not, check your command prompt for any errors (or different local server adress).
+
 ## Community Guidelines
 
 When participating in the DEHB community, please adhere to the following guidelines:

diff --git a/README.md b/README.md
@@ -38,7 +38,7 @@ optimizer.tell(job_info, result)
 
 ##### Using run()
 # Run optimization for 1 bracket. Output files will be saved to ./logs
-traj, runtime, history = optimizer.run(brackets=1, verbose=True)
+traj, runtime, history = optimizer.run(brackets=1)
 ```
 
 #### Running DEHB in a parallel setting
@@ -66,30 +66,6 @@ For more details and features, please have a look at our [documentation](https:/
 ### Contributing
 Any contribution is greaty appreciated! Please take the time to check out our [contributing guidelines](./CONTRIBUTING.md)
 
-### DEHB Hyperparameters
-
-*We recommend the default settings*.
-The default settings were chosen based on ablation studies over a collection of diverse problems 
-and were found to be *generally* useful across all cases tested. 
-However, the parameters are still available for tuning to a specific problem.
-
-The Hyperband components:
-* *min\_fidelity*: Needs to be specified for every DEHB instantiation and is used in determining 
-the fidelity spacing for the problem at hand.
-* *max\_fidelity*: Needs to be specified for every DEHB instantiation. Represents the full-fidelity 
-evaluation or the actual black-box setting.
-* *eta*: (default=3) Sets the aggressiveness of Hyperband's aggressive early stopping by retaining
-1/eta configurations every round
-
-The DE components:
-* *strategy*: (default=`rand1_bin`) Chooses the mutation and crossover strategies for DE. `rand1` 
-represents the *mutation* strategy while `bin` represents the *binomial crossover* strategy. \
-  Other mutation strategies include: {`rand2`, `rand2dir`, `best`, `best2`, `currenttobest1`, `randtobest1`}\
-  Other crossover strategies include: {`exp`}\
-  Mutation and crossover strategies can be combined with a `_` separator, for e.g.: `rand2dir_exp`.
-* *mutation_factor*: (default=0.5) A fraction within [0, 1] weighing the difference operation in DE
-* *crossover_prob*: (default=0.5) A probability within [0, 1] weighing the traits from a parent or the mutant
-
 ---
 
 ### To cite the paper or code

diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md
@@ -0,0 +1 @@
+../CONTRIBUTING.md
diff --git a/docs/getting_started/ask_tell.md b/docs/getting_started/ask_tell.md
diff --git a/docs/getting_started/dehb_hps.md b/docs/getting_started/dehb_hps.md
@@ -0,0 +1,25 @@
+### DEHB Hyperparameters
+
+*We recommend the default settings*.
+The default settings were chosen based on ablation studies over a collection of diverse problems 
+and were found to be *generally* useful across all cases tested. 
+However, the parameters are still available for tuning to a specific problem.
+
+The Hyperband components:
+
+- *min\_fidelity*: Needs to be specified for every DEHB instantiation and is used in determining 
+the fidelity spacing for the problem at hand.
+- *max\_fidelity*: Needs to be specified for every DEHB instantiation. Represents the full-fidelity 
+evaluation or the actual black-box setting.
+- *eta*: (default=3) Sets the aggressiveness of Hyperband's aggressive early stopping by retaining
+1/eta configurations every round
+
+The DE components:
+
+- *strategy*: (default=`rand1_bin`) Chooses the mutation and crossover strategies for DE. `rand1` 
+represents the *mutation* strategy while `bin` represents the *binomial crossover* strategy. \
+  Other mutation strategies include: {`rand2`, `rand2dir`, `best`, `best2`, `currenttobest1`, `randtobest1`}\
+  Other crossover strategies include: {`exp`}\
+  Mutation and crossover strategies can be combined with a `_` separator, for e.g.: `rand2dir_exp`.
+- *mutation_factor*: (default=0.5) A fraction within [0, 1] weighing the difference operation in DE
+- *crossover_prob*: (default=0.5) A probability within [0, 1] weighing the traits from a parent or the mutant
diff --git a/docs/getting_started/logging.md b/docs/getting_started/logging.md
@@ -0,0 +1,15 @@
+### Logging
+DEHB uses `loguru` for logging and will log both to an output file `dehb.log` inside of the specified `output_path` and to `stdout`. In order to customize the log level, you can pass a `log_level` to the `kwargs` of DEHB. These log levels directly represent the different log levels in loguru. For more information on the different log levels, checkout [their website](https://loguru.readthedocs.io/en/stable/api/logger.html#levels).
+An example for the initialization of DEHB using a log level of "WARNING" is presented in the following:
+```python
+dehb = DEHB(
+    f=objective_function,
+    cs=config_space,
+    dimensions=2,
+    min_fidelity=3,
+    max_fidelity=27,
+    eta=3,
+    output_path="./log_example",
+    log_level="WARNING",
+)
+```
diff --git a/docs/getting_started/running_dehb.md b/docs/getting_started/running_dehb.md
@@ -0,0 +1,65 @@
+## Running DEHB using Ask & Tell or built-in run function
+### Introduction
+DEHB allows users to either utilize the Ask & Tell interface for manual task distribution or leverage the built-in functionality (`run`) to set up a Dask cluster autonomously. DEHB aims to minimize the objective function (`f=`) specified by the user, thus this function play a central role in the optimization. In the following we aim to give an overview about the arguments the objective function must have and how the structure of the results should look like.
+
+### The Objective Function
+The objective function needs to have the parameters `config` and `fidelity` and evaluate the given configuration on the given fidelity. In a neural network optimization context, the fidelity could e.g. be the number of epochs to run the hyperparameter configuration for.
+
+Let us now have a look at what the objective function should return. DEHB expects to receive a results `dict` from the objective function. has to contain the keys `fitness` and `cost`. `fitness` resembles the objective you are trying to optimize, e.g. validation loss. `cost` resembles the computational cost for computing the result, e.g. the wallclock time for training and validating a neural network to achieve the validation loss specified in `fitness`. It is also possible to add the field `info` to the `result` in order to store additional, user-specific information.
+
+!!! note "User-specific information `info`"
+
+Please note, that we only support types, that are serializable by `pandas`. If
+non-serializable types are used, DEHB will not be able to save the history.
+If you want to be on the safe side, please use built-in python types.
+
+Now that we have cleared up what the inputs and outputs of the objective function should be, we will also provide you with a small example of what the objective function could look like. For a complete example, please have a look at one of our [examples](../examples/01.1_Optimizing_RandomForest_using_DEHB.ipynb).
+
+```python
+def your_objective_function(config, fidelity):
+    val_loss, val_accuracy, time_taken = train_config_for_epochs(config, fidelity)
+
+    # Note, that we use the validation loss as the feedback signal for DEHB, since we aim to minimize it
+    return {
+        "fitness": val_loss,    # mandatory
+        "cost": time_taken,     # mandatory
+        "info": {               # optional
+            "validation_accuracy": val_acc
+        }
+    }
+```
+
+### Run Function
+To utilize the `run` function, simply setup DEHB as you prefer and then call `dehb.run` with your specified compute budget:
+
+```python
+optimizer = DEHB(
+    f=your_objective_function,
+    cs=config_space, 
+    dimensions=dimensions, 
+    min_fidelity=min_fidelity, 
+    max_fidelity=max_fidelity)
+
+optimizer.run(fevals=20) # Run for 20 function evaluations
+```
+
+### Ask & Tell
+The Ask & Tell functionality can be utilized as follows:
+
+```python
+optimizer = DEHB(
+    f=your_objective_function, # Here we do not need to necessarily specify the objective function, but it can still be useful to call 'run' later.
+    cs=config_space, 
+    dimensions=dimensions, 
+    min_fidelity=min_fidelity, 
+    max_fidelity=max_fidelity)
+
+# Ask for next configuration to run
+job_info = optimizer.ask()
+
+# Run the configuration for the given fidelity. Here you can freely distribute the computation to any worker you'd like.
+result = your_objective_function(config=job_info["config"], fidelity=job_info["fidelity"])
+
+# When you received the result, feed them back to the optimizer
+optimizer.tell(job_info, result)
+```
diff --git a/docs/getting_started/single_worker.md b/docs/getting_started/single_worker.md
@@ -43,8 +43,9 @@ optimizer = DEHB(
 )
 
 # Run optimization for 1 bracket. Output files will be saved to ./logs
-traj, runtime, history = optimizer.run(brackets=1, verbose=True)
+traj, runtime, history = optimizer.run(brackets=1)
 config_id, config, fitness, runtime, fidelity, _ = history[0]
+print("config id", config_id)
 print("config", config)
 print("fitness", fitness)
 print("runtime", runtime)

diff --git a/docs/index.md b/docs/index.md
@@ -29,7 +29,7 @@ pip install dehb
 DEHB allows users to either utilize the Ask & Tell interface for manual task distribution or leverage the built-in functionality (`run`) to set up a Dask cluster autonomously. Please refer to our [Getting Started](getting_started/single_worker.md) examples.
 
 ## Contributing
-Please have a look at our [contributing guidelines](https://github.com/automl/DEHB/blob/master/CONTRIBUTING.md).
+Please have a look at our [contributing guidelines](./CONTRIBUTING.md).
 
 ## To cite the paper or code
 If you use DEHB in one of your research projects, please cite our paper(s):