Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reg cocktails #358

Draft
wants to merge 50 commits into
base: development
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
e138b69
First push for Mix/cut regularization
franchuterivera Feb 19, 2021
3b4feae
Add cyclic property to lr scheduler and use_swa to trainer
ravinkohli Feb 19, 2021
b845d4d
ADD tests for SWA, SE
ravinkohli Feb 22, 2021
8a2364d
ADD tests for LookAhead
ravinkohli Feb 22, 2021
e2ee40a
Added default for lookahead config
ravinkohli Feb 22, 2021
6e0d2bb
Fix errors in Adversarial training
ravinkohli Feb 23, 2021
9290b07
Fix pickling error for swa model
ravinkohli Feb 23, 2021
c35c795
Fix issues after rebase from refactor_development
ravinkohli Feb 23, 2021
2512709
Added n_restarts as hyperparameter for CosineAnnealing
ravinkohli Mar 1, 2021
41677fd
fix bug with ealy stopping and swa
ravinkohli Mar 1, 2021
e236df0
cont...
ravinkohli Mar 1, 2021
f29a7e0
Addressed comments from arlind, change in T_mul and T_0 calculations
ravinkohli Mar 7, 2021
f141d06
Updating search space (#156)
ArlindKadra Apr 6, 2021
9339895
Adding constant clause in the testing module for hyperparameter range…
ArlindKadra Apr 12, 2021
126f7d4
Updating implementation for tabular regression
ArlindKadra Apr 12, 2021
f6f05ba
Fixing buggy implementation of the network head with constant updates
ArlindKadra Apr 15, 2021
456e261
Updating implementation
ArlindKadra Apr 20, 2021
c7af699
Implementation fix for constant updates to skip connections. Multibra…
ArlindKadra Apr 20, 2021
2102b08
Fixing the implementation for weight decay in the case of fixed updat…
ArlindKadra Apr 20, 2021
18da2bd
update setup.py
ravinkohli Apr 21, 2021
18bdabf
Updating implementation of the reg cocktails so that it is compatible…
ArlindKadra Apr 22, 2021
0c2c604
Create fit evaluator, no resampling strategy and fix bug for test sta…
ravinkohli Apr 30, 2021
6d4790f
Additional metrics during train (#194)
ravinkohli May 3, 2021
5168ba5
Fixing issues with imbalanced datasets (#197)
ArlindKadra May 7, 2021
23d808b
Reproducibility in cocktail (#204)
ravinkohli May 11, 2021
6283c56
fix bug in adversarial trainer (#207)
ravinkohli May 11, 2021
bc0540b
Add dropout shape as a hyperparameter (#213)
ravinkohli May 14, 2021
5d6062f
Change weighted loss to categorical and fix for test adversarial trai…
ravinkohli May 14, 2021
622c185
added no head (#218)
ravinkohli May 17, 2021
c4b7729
Fix bugs in cutout training (#233)
ravinkohli May 21, 2021
0c8d2ff
Cocktail hotfixes (#245)
ArlindKadra Jun 3, 2021
c1a73f8
[refactor] Address Shuhei's comments
nabenabe0928 Sep 13, 2021
769e041
[doc] Add referencing to each regularization techniques
nabenabe0928 Sep 15, 2021
0da4f72
[fix] Address Ravin's comments and fix range issues in row cut
nabenabe0928 Sep 21, 2021
c4a4565
[doc] Add the reference to the fit_dictionary
nabenabe0928 Sep 21, 2021
6543316
Bug fixes (#249)
ArlindKadra Oct 21, 2021
392f07a
[FIX] Passing checks (#298)
ravinkohli Dec 7, 2021
02e97a1
[FIX] Tests after rebase of `reg_cocktails` (#359)
ravinkohli Dec 10, 2021
03ddb64
rebase and fix flake
ravinkohli Dec 21, 2021
59b5830
fix merge conflicts after rebase
ravinkohli Jan 28, 2022
c3b8844
[FIX] Enable preprocessing in reg_cocktails (#369)
ravinkohli Feb 9, 2022
c1fffa1
fixes after rebase
ravinkohli Feb 28, 2022
366bede
[FIX] SWA and SE with non cyclic schedulers (#395)
ravinkohli Mar 9, 2022
637a68b
fixes after rebase
ravinkohli Mar 9, 2022
e69ff3b
fix tests after rebase
ravinkohli Jul 26, 2022
c138173
fix mypy and flake
ravinkohli Jul 26, 2022
afddca5
fix silly removal of lightgbm
ravinkohli Jul 26, 2022
34c704d
[add] documentation update in base trainer (#468)
theodorju Aug 12, 2022
d29d11b
[FIX] apply cutout for each row. (#481)
ravinkohli Sep 23, 2022
873df9a
[FIX] ROC AUC for multi class classification (#482)
ravinkohli Oct 17, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
380 changes: 307 additions & 73 deletions autoPyTorch/api/base_task.py

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions autoPyTorch/api/tabular_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ def search(
memory_limit: int = 4096,
smac_scenario_args: Optional[Dict[str, Any]] = None,
get_smac_object_callback: Optional[Callable] = None,
all_supported_metrics: bool = True,
all_supported_metrics: bool = False,
precision: int = 32,
disable_file_output: Optional[List[Union[str, DisableFileOutputParameters]]] = None,
load_models: bool = True,
Expand Down Expand Up @@ -354,7 +354,7 @@ def search(
TargetAlgorithm to be optimised. If None, `eval_function`
available in autoPyTorch/evaluation/train_evaluator is used.
Must be child class of AbstractEvaluator.
all_supported_metrics (bool: default=True):
all_supported_metrics (bool: default=False):
If True, all metrics supporting current task will be calculated
for each pipeline and results will be available via cv_results
precision (int: default=32):
Expand Down
5 changes: 2 additions & 3 deletions autoPyTorch/api/tabular_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,6 @@ class TabularRegressionTask(BaseTask):
Search space updates that can be used to modify the search
space of particular components or choice modules of the pipeline
"""

def __init__(
self,
seed: int = 1,
Expand Down Expand Up @@ -254,7 +253,7 @@ def search(
memory_limit: int = 4096,
smac_scenario_args: Optional[Dict[str, Any]] = None,
get_smac_object_callback: Optional[Callable] = None,
all_supported_metrics: bool = True,
all_supported_metrics: bool = False,
precision: int = 32,
disable_file_output: Optional[List[Union[str, DisableFileOutputParameters]]] = None,
load_models: bool = True,
Expand Down Expand Up @@ -354,7 +353,7 @@ def search(
TargetAlgorithm to be optimised. If None, `eval_function`
available in autoPyTorch/evaluation/train_evaluator is used.
Must be child class of AbstractEvaluator.
all_supported_metrics (bool: default=True):
all_supported_metrics (bool: default=False):
If True, all metrics supporting current task will be calculated
for each pipeline and results will be available via cv_results
precision (int: default=32):
Expand Down
10 changes: 8 additions & 2 deletions autoPyTorch/api/time_series_forecasting.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,7 @@ def search(
memory_limit: Optional[int] = 4096,
smac_scenario_args: Optional[Dict[str, Any]] = None,
get_smac_object_callback: Optional[Callable] = None,
all_supported_metrics: bool = True,
all_supported_metrics: bool = False,
precision: int = 32,
disable_file_output: List = [],
load_models: bool = True,
Expand Down Expand Up @@ -396,7 +396,7 @@ def search(
instances, num_params, runhistory, seed and ta. This is
an advanced feature. Use only if you are familiar with
[SMAC](https://automl.github.io/SMAC3/master/index.html).
all_supported_metrics (bool), (default=True): if True, all
all_supported_metrics (bool), (default=False): if True, all
metrics supporting current task will be calculated
for each pipeline and results will be available via cv_results
precision (int), (default=32): Numeric precision used when loading
Expand Down Expand Up @@ -526,6 +526,9 @@ def predict(
predicted value, it needs to be with shape (B, H, N),
B is the number of series, H is forecasting horizon (n_prediction_steps), N is the number of targets
"""
if self.dataset is None:
raise AttributeError(f"Expected dataset to be initialised when predicting in {self.__class__.__name__}")

if X_test is None or not isinstance(X_test[0], TimeSeriesSequence):
assert past_targets is not None
# Validate and construct TimeSeriesSequence
Expand Down Expand Up @@ -566,6 +569,9 @@ def update_sliding_window_size(self, n_prediction_steps: int) -> None:
forecast horizon. Sometimes we could also make our base sliding window size based on the
forecast horizon
"""
if self.dataset is None:
raise AttributeError(f"Expected dataset to be initialised when updating sliding window"
f" in {self.__class__.__name__}")
base_window_size = int(np.ceil(self.dataset.base_window_size))
# we don't want base window size to large, which might cause a too long computation time, in which case
# we will use n_prediction_step instead (which is normally smaller than base_window_size)
Expand Down
58 changes: 50 additions & 8 deletions autoPyTorch/data/base_feature_validator.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import logging
from typing import List, Optional, Union
from typing import List, Optional, Set, Tuple, Union

import numpy as np

Expand All @@ -24,24 +24,21 @@ class BaseFeatureValidator(BaseEstimator):
List of the column types found by this estimator during fit.
data_type (str):
Class name of the data type provided during fit.
column_transformer (Optional[BaseEstimator])
encoder (Optional[BaseEstimator])
Host a encoder object if the data requires transformation (for example,
if provided a categorical column in a pandas DataFrame)
transformed_columns (List[str])
List of columns that were encoded.
if provided a categorical column in a pandas DataFrame).
"""
def __init__(
self,
logger: Optional[Union[PicklableClientLogger, logging.Logger]] = None,
):
) -> None:
# Register types to detect unsupported data format changes
self.feat_types: Optional[List[str]] = None
self.data_type: Optional[type] = None
self.dtypes: List[str] = []
self.column_order: List[str] = []

self.column_transformer: Optional[BaseEstimator] = None
self.transformed_columns: List[str] = []

self.logger: Union[
PicklableClientLogger, logging.Logger
Expand All @@ -52,6 +49,9 @@ def __init__(
self.categories: List[List[int]] = []
self.categorical_columns: List[int] = []
self.numerical_columns: List[int] = []
self.encode_columns: List[str] = []

self.all_nan_columns: Optional[Set[Union[int, str]]] = None

self._is_fitted = False

Expand All @@ -75,7 +75,7 @@ def fit(

# If a list was provided, it will be converted to pandas
if isinstance(X_train, list):
X_train, X_test = self.list_to_dataframe(X_train, X_test)
X_train, X_test = self.list_to_pandas(X_train, X_test)

self._check_data(X_train)

Expand Down Expand Up @@ -109,6 +109,22 @@ def _fit(
self:
The fitted base estimator
"""

raise NotImplementedError()

def _check_data(
self,
X: SupportedFeatTypes,
) -> None:
"""
Feature dimensionality and data type checks

Args:
X (SupportedFeatTypes):
A set of features that are going to be validated (type and dimensionality
checks) and a encoder fitted in the case the data needs encoding
"""

raise NotImplementedError()

def transform(
Expand All @@ -125,4 +141,30 @@ def transform(
np.ndarray:
The transformed array
"""

raise NotImplementedError()

def list_to_pandas(
self,
X_train: SupportedFeatTypes,
X_test: Optional[SupportedFeatTypes] = None,
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
"""
Converts a list to a pandas DataFrame. In this process, column types are inferred.

If test data is provided, we proactively match it to train data

Args:
X_train (SupportedFeatTypes):
A set of features that are going to be validated (type and dimensionality
checks) and a encoder fitted in the case the data needs encoding
X_test (Optional[SupportedFeatTypes]):
A hold out set of data used for checking
Returns:
pd.DataFrame:
transformed train data from list to pandas DataFrame
pd.DataFrame:
transformed test data from list to pandas DataFrame
"""

raise NotImplementedError()
4 changes: 2 additions & 2 deletions autoPyTorch/data/base_target_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def __init__(self,
logging.Logger
]
] = None,
):
) -> None:
self.is_classification = is_classification

self.data_type: Optional[type] = None
Expand Down Expand Up @@ -131,7 +131,7 @@ def _fit(

def transform(
self,
y: Union[SupportedTargetTypes],
y: SupportedTargetTypes,
) -> np.ndarray:
"""
Args:
Expand Down
Loading