Skip to content

Commit

Permalink
Bug fixes (#249)
Browse files Browse the repository at this point in the history
* Update implementation

* Coding style fixes

* Implementation update

* Style fix

* Turn weighted loss into a constant again, implementation update

* Cocktail branch inconsistencies (#275)

* To nemo

* Revert change in T_curr as results conclusively prove it should be 0

* Revert cutmix change after data from run

* Final conclusion after results

* FIX bug in shake alpha beta

* Updated if is_training condition for shake drop

* Remove temp fix in row cutmic

* Cocktail fixes time debug (#286)

* preprocess inside data validator

* add time debug statements

* Add fixes for categorical data

* add fit_ensemble

* add arlind fix for swa and se

* fix bug in trainer choice fit

* fix ensemble bug

* Correct bug in cleanup

* Cleanup for removing time debug statements

* ablation for adversarial

* shuffle false in dataloader

* drop last false in dataloader

* fix bug for validation set, and cutout and cutmix

* shuffle = False

* Shake Shake updates (#287)

* To test locally

* fix bug in trainer choice fit

* fix ensemble bug

* Correct bug in cleanup

* To test locally

* Cleanup for removing time debug statements

* ablation for adversarial

* shuffle false in dataloader

* drop last false in dataloader

* fix bug for validation set, and cutout and cutmix

* To test locally

* shuffle = False

* To test locally

* updates to search space

* updates to search space

* update branch with search space

* undo search space update

* fix bug in shake shake flag

* limit to shake-even

* restrict to even even

* Add even even and others for shake-drop also

* fix bug in passing alpha beta method

* restrict to only even even

* fix silly bug:

* remove imputer and ordinal encoder for categorical transformer in feature validator

* Address comments from shuhei

* fix issues with ensemble fitting post hoc

* Address comments on the PR

* Fix flake and mypy errors

* Address comments from PR #286

* fix bug in embedding

* Update autoPyTorch/api/tabular_classification.py

Co-authored-by: nabenabe0928 <[email protected]>

* Update autoPyTorch/datasets/base_dataset.py

Co-authored-by: nabenabe0928 <[email protected]>

* Update autoPyTorch/datasets/base_dataset.py

Co-authored-by: nabenabe0928 <[email protected]>

* Update autoPyTorch/pipeline/components/training/trainer/base_trainer.py

Co-authored-by: nabenabe0928 <[email protected]>

* Address comments from shuhei

* adress comments from shuhei

* fix flake and mypy

* Update autoPyTorch/pipeline/components/training/trainer/RowCutMixTrainer.py

Co-authored-by: nabenabe0928 <[email protected]>

* Update autoPyTorch/pipeline/tabular_classification.py

Co-authored-by: nabenabe0928 <[email protected]>

* Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py

Co-authored-by: nabenabe0928 <[email protected]>

* Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py

Co-authored-by: nabenabe0928 <[email protected]>

* Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py

Co-authored-by: nabenabe0928 <[email protected]>

* Apply suggestions from code review

Co-authored-by: nabenabe0928 <[email protected]>

* increase threads_per_worker

* fix bug in rowcutmix

* Enhancement for the tabular validator. (#291)

* Initial try at an enhancement for the tabular validator

* Adding a few type annotations

* Fixing bugs in implementation

* Adding wrongly deleted code part during rebase

* Fix bug in _get_args

* Fix bug in _get_args

* Addressing Shuhei's comments

* Address Shuhei's comments

* Refactoring code

* Refactoring code

* Typos fix and additional comments

* Replace nan in categoricals with simple imputer

* Remove unused function

* add comment

* Update autoPyTorch/data/tabular_feature_validator.py

Co-authored-by: nabenabe0928 <[email protected]>

* Update autoPyTorch/data/tabular_feature_validator.py

Co-authored-by: nabenabe0928 <[email protected]>

* Adding unit test for only nall columns in the tabular feature categorical evaluator

* fix bug in remove all nan columns

* Bug fix for making tests run by arlind

* fix flake errors in feature validator

* made typing code uniform

* Apply suggestions from code review

Co-authored-by: nabenabe0928 <[email protected]>

* address comments from shuhei

* address comments from shuhei (2)

Co-authored-by: Ravin Kohli <[email protected]>
Co-authored-by: Ravin Kohli <[email protected]>
Co-authored-by: nabenabe0928 <[email protected]>

* Apply suggestions from code review

Co-authored-by: nabenabe0928 <[email protected]>

* resolve code issues with new versions

* Address comments from shuhei

* make run_traditional_ml function

* implement suggestion from shuhei and fix bug in rowcutmixtrainer

* fix return type docstring

* add better documentation and fix bug in shake_drop_get_bl

* Apply suggestions from code review

Co-authored-by: nabenabe0928 <[email protected]>

* add test for comparator and other improvements based on PR comments

* fix bug in test

* [fix] Fix the condition in the raising error of all_nan_columns

* [refactor] Unite name conventions of numpy array and pandas dataframe

* [doc] Add the description about the tabular feature transformation

* [doc] Add the description of the tabular feature transformation

* address comments from arlind

* address comments from arlind

* change to as_tensor and address comments from arlind

* correct description for functions in data module

Co-authored-by: nabenabe0928 <[email protected]>
Co-authored-by: Arlind Kadra <[email protected]>
Co-authored-by: nabenabe0928 <[email protected]>

* Addressing Shuhei's comments

* flake8 problems fix

* Update autoPyTorch/api/base_task.py

Add indent.

Co-authored-by: Ravin Kohli <[email protected]>

* Update autoPyTorch/api/base_task.py

Add indent.

Co-authored-by: Ravin Kohli <[email protected]>

* Update autoPyTorch/data/tabular_feature_validator.py

Add indentation.

Co-authored-by: Ravin Kohli <[email protected]>

* Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py

Add line indentation.

Co-authored-by: Ravin Kohli <[email protected]>

* Update autoPyTorch/data/tabular_feature_validator.py

Validate if there is a column transformer since for sparse matrices we will not have one.

Co-authored-by: Ravin Kohli <[email protected]>

* Update autoPyTorch/utils/implementations.py

Delete uncommented line.

Co-authored-by: Ravin Kohli <[email protected]>

* Allow the number of threads to be given by the user

* Removing unnecessary argument and refactoring the attribute.

* Addressing Ravin's comments

* Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py

Updating the function documentation according to the agreed style.

Co-authored-by: Ravin Kohli <[email protected]>

* Update autoPyTorch/pipeline/components/setup/network_backbone/utils.py

Providing information on the wrong method provided for shake-shake regularization.

Co-authored-by: nabenabe0928 <[email protected]>

* add todo for backend and accept changes from shuhei

* Addressing Shuhei's and Ravin's comments

* Addressing Shuhei's and Ravin's comments, bug fix

* Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py

Improving code readibility.

Co-authored-by: nabenabe0928 <[email protected]>

* Update autoPyTorch/pipeline/components/setup/network_backbone/ResNetBackbone.py

Improving consistency.

Co-authored-by: nabenabe0928 <[email protected]>

* bug fix

Co-authored-by: Ravin Kohli <[email protected]>
Co-authored-by: nabenabe0928 <[email protected]>
Co-authored-by: nabenabe0928 <[email protected]>
Co-authored-by: Ravin Kohli <[email protected]>
  • Loading branch information
5 people committed Dec 8, 2021
1 parent d7d05c8 commit a16cbbb
Show file tree
Hide file tree
Showing 37 changed files with 1,831 additions and 427 deletions.
337 changes: 283 additions & 54 deletions autoPyTorch/api/base_task.py

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions autoPyTorch/api/tabular_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,8 @@ def search(
)


if self.dataset is None:
raise ValueError("`dataset` in {} must be initialized, but got None".format(self.__class__.__name__))
return self._search(
dataset=self.dataset,
optimize_metric=optimize_metric,
Expand Down
3 changes: 2 additions & 1 deletion autoPyTorch/api/tabular_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,6 @@ class TabularRegressionTask(BaseTask):
search space updates that can be used to modify the search
space of particular components or choice modules of the pipeline
"""

def __init__(
self,
seed: int = 1,
Expand Down Expand Up @@ -303,6 +302,8 @@ def search(
)


if self.dataset is None:
raise ValueError("`dataset` in {} must be initialized, but got None".format(self.__class__.__name__))
return self._search(
dataset=self.dataset,
optimize_metric=optimize_metric,
Expand Down
45 changes: 36 additions & 9 deletions autoPyTorch/data/base_feature_validator.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import logging
from typing import List, Optional, Union
from typing import List, Optional, Set, Tuple, Union

import numpy as np

Expand Down Expand Up @@ -35,24 +35,21 @@ class BaseFeatureValidator(BaseEstimator):
List of the column types found by this estimator during fit.
data_type (str):
Class name of the data type provided during fit.
column_transformer (Optional[BaseEstimator])
encoder (Optional[BaseEstimator])
Host a encoder object if the data requires transformation (for example,
if provided a categorical column in a pandas DataFrame)
transformed_columns (List[str])
List of columns that were encoded.
if provided a categorical column in a pandas DataFrame).
"""
def __init__(
self,
logger: Optional[Union[PicklableClientLogger, logging.Logger]] = None,
):
) -> None:
# Register types to detect unsupported data format changes
self.feat_type: Optional[List[str]] = None
self.data_type: Optional[type] = None
self.dtypes: List[str] = []
self.column_order: List[str] = []

self.column_transformer: Optional[BaseEstimator] = None
self.transformed_columns: List[str] = []

self.logger: Union[
PicklableClientLogger, logging.Logger
Expand All @@ -64,6 +61,8 @@ def __init__(
self.categorical_columns: List[int] = []
self.numerical_columns: List[int] = []

self.all_nan_columns: Optional[Set[Union[int, str]]] = None

self._is_fitted = False

def fit(
Expand All @@ -86,7 +85,7 @@ def fit(

# If a list was provided, it will be converted to pandas
if isinstance(X_train, list):
X_train, X_test = self.list_to_dataframe(X_train, X_test)
X_train, X_test = self.list_to_pandas(X_train, X_test)

self._check_data(X_train)

Expand Down Expand Up @@ -120,6 +119,7 @@ def _fit(
self:
The fitted base estimator
"""

raise NotImplementedError()

def _check_data(
Expand All @@ -129,11 +129,12 @@ def _check_data(
"""
Feature dimensionality and data type checks
Arguments:
Args:
X (SUPPORTED_FEAT_TYPES):
A set of features that are going to be validated (type and dimensionality
checks) and a encoder fitted in the case the data needs encoding
"""

raise NotImplementedError()

def transform(
Expand All @@ -150,4 +151,30 @@ def transform(
np.ndarray:
The transformed array
"""

raise NotImplementedError()

def list_to_pandas(
self,
X_train: SUPPORTED_FEAT_TYPES,
X_test: Optional[SUPPORTED_FEAT_TYPES] = None,
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
"""
Converts a list to a pandas DataFrame. In this process, column types are inferred.
If test data is provided, we proactively match it to train data
Args:
X_train (SUPPORTED_FEAT_TYPES):
A set of features that are going to be validated (type and dimensionality
checks) and a encoder fitted in the case the data needs encoding
X_test (Optional[SUPPORTED_FEAT_TYPES]):
A hold out set of data used for checking
Returns:
pd.DataFrame:
transformed train data from list to pandas DataFrame
pd.DataFrame:
transformed test data from list to pandas DataFrame
"""

raise NotImplementedError()
5 changes: 3 additions & 2 deletions autoPyTorch/data/base_target_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def __init__(self,
logging.Logger
]
] = None,
):
) -> None:
self.is_classification = is_classification

self.data_type: Optional[type] = None
Expand Down Expand Up @@ -98,6 +98,7 @@ def fit(
np.shape(y_test)
))
if isinstance(y_train, pd.DataFrame):
y_train = cast(pd.DataFrame, y_train)
y_test = cast(pd.DataFrame, y_test)
if y_train.columns.tolist() != y_test.columns.tolist():
raise ValueError(
Expand Down Expand Up @@ -143,7 +144,7 @@ def _fit(

def transform(
self,
y: Union[SUPPORTED_TARGET_TYPES],
y: SUPPORTED_TARGET_TYPES,
) -> np.ndarray:
"""
Args:
Expand Down
Loading

0 comments on commit a16cbbb

Please sign in to comment.