0.2.8 (#29)

* start * Transformer upgrades continue * transformer none breaking updates * lots and lots of things * take the keys and see what happens! * Update TODO.md * black files * Update auto_ts.py * docs build part 1 * docs part ii
winedarksea · Dec 13, 2020 · ea82833 · ea82833
1 parent 87e75d0
commit ea82833
Show file tree

Hide file tree

Showing 51 changed files with 690 additions and 1,219 deletions.
diff --git a/README.md b/README.md
@@ -2,24 +2,28 @@
 
 <img src="/img/autots_logo.png" width="400" height="184" title="AutoTS Logo">
 
-**Model Selection for Multiple Time Series**
+**Forecasting Model Selection for Multiple Time Series**
 
-Simple package for comparing and predicting with open-source time series implementations.
+AutoML for forecasting with open-source time series implementations.
 
 For other time series needs, check out the list [here](https://github.com/MaxBenChrist/awesome_time_series_in_python).
 
 ## Features
-* Twenty available model classes, with tens of thousands of possible hyperparameter configurations
-* Finds optimal time series models by genetic programming
+* Finds optimal time series forecasting model and data transformations by genetic programming optimization
 * Handles univariate and multivariate/parallel time series
-* Point and probabilistic forecasts
-* Ability to handle messy data by learning optimal NaN imputation and outlier removal
-* Ability to add external known-in-advance regressor
+* Point and probabilistic upper/lower bound forecasts for all models
+* Twenty-two available model classes, with tens of thousands of possible hyperparameter configurations
+	* Includes naive, statistical, machine learning, and deep learning models
+	* Multiprocessing for univariate models for scalability on multivariate datasets
+	* Ability to add external regressors
+* Over thirty time series specific data transformations
+	* Ability to handle messy data by learning optimal NaN imputation and outlier removal
 * Allows automatic ensembling of best models
+	* 'horizontal' ensembling on multivariate series - learning the best model for each series
 * Multiple cross validation options
 * Subsetting and weighting to improve search on many multivariate series
 * Option to use one or a combination of metrics for model selection
-* Import and export of templates allowing greater user customization
+* Import and export of model templates for deployment and greater user customization
 
 ## Installation
 ```

diff --git a/TODO.md b/TODO.md
@@ -15,12 +15,11 @@
 * Forecasts are desired for the future immediately following the most recent data.
 
 # Latest
-* 2x speedup in transformation runtime by removing double transformation
-* joblib parallel to UnobservedComponents
-* ClipOutliers transformer, Discretize Transformer, CenterLastValue - added in prep for transform template change
-* bug fix on IntermittentOccurence
-* minor changes to ETS, now replaces single series failure with zero fill, damped now is damped_trend
-* 0.3.0 is expected to feature a breaking change to model templates in the transformation/pre-processing
+* Round transformer to replace coerce_integer, ClipOutliers expanded, Slice to replace context_slicer
+* pd.df Interpolate methods added to FillNA options, " " to "_" in names, rolling_mean_24
+* slight improvement to printed progress messages
+* transformer_list (also takes a dict of value:probability) allows adjusting which transformers are created in new generations.
+	* this does not apply to transformers loaded from imported templates
 
 # Known Errors: 
 DynamicFactor holidays 	Exceptions 'numpy.ndarray' object has no attribute 'values'
@@ -211,12 +210,10 @@ Tensorflow GPU backend may crash on occasion.
 
 
 #### New Transformations:
-	Sklearn iterative imputer 
 	lag and beta to DifferencedTransformer to make it more of an AR process
 	Weighted moving average
 	Symbolic aggregate approximation (SAX) and (PAA) (basically these are just binning)
 	Shared discretization (all series get same shared binning)
-	Last Value Centering
 	More sophisticated fillna methods
 	Constraint as a transformation parameter
 
@@ -226,3 +223,10 @@ Tensorflow GPU backend may crash on occasion.
 	* add to recombination_approved if so, in auto_model.py
 	* add to no_shared if so, in auto_model.py
 	* add to model table in extended_tutorial.md
+
+## New Transformer Checklist:
+	* Make sure that if it modifies the size (more/fewer columns or rows) it returns pd.DataFrame with proper index/columns
+	* depth of recombination is?
+	* add to "all" transformer dict
+	* add to no_shared if so, in auto_model.py
+	* oddities_list for those with forecast/original transform difference
diff --git a/autots/__init__.py b/autots/__init__.py
@@ -3,17 +3,19 @@
 
 https://github.com/winedarksea/AutoTS
 """
-from autots.datasets import load_hourly
-from autots.datasets import load_daily
-from autots.datasets import load_monthly
-from autots.datasets import load_yearly
-from autots.datasets import load_weekly
+from autots.datasets import (
+    load_hourly,
+    load_daily,
+    load_monthly,
+    load_yearly,
+    load_weekly,
+)
 
 from autots.evaluator.auto_ts import AutoTS
 from autots.tools.transform import GeneralTransformer
 from autots.tools.shaping import long_to_wide
 
-__version__ = '0.2.7'
+__version__ = '0.2.8'
 
 
 __all__ = [

diff --git a/autots/evaluator/auto_model.py b/autots/evaluator/auto_model.py
@@ -10,88 +10,32 @@
 
 def seasonal_int(include_one: bool = False):
     """Generate a random integer of typical seasonalities."""
-    if include_one:
-        lag = np.random.choice(
-            a=[
-                'random_int',
-                1,
-                2,
-                4,
-                7,
-                10,
-                12,
-                24,
-                28,
-                60,
-                96,
-                168,
-                364,
-                1440,
-                420,
-                52,
-                84,
-            ],
-            size=1,
-            p=[
-                0.10,
-                0.05,
-                0.05,
-                0.05,
-                0.15,
-                0.01,
-                0.1,
-                0.1,
-                0.1,
-                0.1,
-                0.04,
-                0.01,
-                0.1,
-                0.01,
-                0.01,
-                0.01,
-                0.01,
-            ],
-        ).item()
-    else:
-        lag = np.random.choice(
-            a=[
-                'random_int',
-                2,
-                4,
-                7,
-                10,
-                12,
-                24,
-                28,
-                60,
-                96,
-                168,
-                364,
-                1440,
-                420,
-                52,
-                84,
-            ],
-            size=1,
-            p=[
-                0.15,
-                0.05,
-                0.05,
-                0.15,
-                0.01,
-                0.1,
-                0.1,
-                0.1,
-                0.1,
-                0.04,
-                0.01,
-                0.1,
-                0.01,
-                0.01,
-                0.01,
-                0.01,
-            ],
-        ).item()
+    prob_dict = {
+        'random_int': 0.1,
+        1: 0.05,
+        2: 0.05,
+        4: 0.05,
+        7: 0.15,
+        10: 0.01,
+        12: 0.1,
+        24: 0.1,
+        28: 0.1,
+        60: 0.1,
+        96: 0.04,
+        168: 0.01,
+        364: 0.1,
+        1440: 0.01,
+        420: 0.01,
+        52: 0.01,
+        84: 0.01,
+    }
+    lag = np.random.choice(
+        a=list(prob_dict.keys()),
+        p=list(prob_dict.values()),
+        size=1,
+    ).item()
+    if not include_one and str(lag) == '1':
+        lag = 'random_int'
     if lag == 'random_int':
         lag = np.random.randint(2, 100, size=1).item()
     return int(lag)
@@ -1018,7 +962,13 @@ def unpack_ensemble_models(
     keep_ensemble: bool = True,
     recursive: bool = False,
 ):
-    """Take ensemble models from template and add as new rows."""
+    """Take ensemble models from template and add as new rows.
+
+    Args:
+        template (pd.DataFrame): AutoTS template containing template_cols
+        keep_ensemble (bool): if False, drop row containing original ensemble
+        recursive (bool): if True, unnest ensembles of ensembles...
+    """
     ensemble_template = pd.DataFrame()
     template['Ensemble'] = np.where(
         ((template['Model'] == 'Ensemble') & (template['Ensemble'] < 1)),
@@ -1245,6 +1195,8 @@ def TemplateWizard(
     verbose: int = 0,
     n_jobs: int = None,
     validation_round: int = 0,
+    current_generation: int = 0,
+    max_generations: int = 0,
     model_interrupt: bool = False,
     grouping_ids=None,
     template_cols: list = [
@@ -1277,6 +1229,8 @@ def TemplateWizard(
         holiday_country (str): passed through to holiday package, used by a few models as 0/1 regressor.
         startTimeStamps (pd.Series): index (series_ids), columns (Datetime of First start of series)
         validation_round (int): int passed to record current validation.
+        current_generation (int): info to pass to print statements
+        max_generations (int): info to pass to print statements
         model_interrupt (bool): if True, keyboard interrupts are caught and only break current model eval.
         template_cols (list): column names of columns used as model template
 
@@ -1299,24 +1253,34 @@ def TemplateWizard(
             current_template = pd.DataFrame(row).transpose()
             template_result.model_count += 1
             if verbose > 0:
-                if verbose > 1:
-                    print(
-                        "Model Number: {} with model {} in Validation {} with params {} and transformations {}".format(
+                if validation_round >= 1:
+                    base_print = (
+                        "Model Number: {} of {} with model {} for Validation {}".format(
                             str(template_result.model_count),
+                            template.shape[0],
                             model_str,
                             str(validation_round),
-                            json.dumps(parameter_dict),
-                            json.dumps(transformation_dict),
                         )
                     )
                 else:
-                    print(
-                        "Model Number: {} with model {} in Validation {} ".format(
+                    base_print = (
+                        "Model Number: {} with model {} in generation {} of {}".format(
                             str(template_result.model_count),
                             model_str,
-                            str(validation_round),
+                            str(current_generation),
+                            str(max_generations),
                         )
                     )
+                if verbose > 1:
+                    print(
+                        base_print
+                        + " with params {} and transformations {}".format(
+                            json.dumps(parameter_dict),
+                            json.dumps(transformation_dict),
+                        )
+                    )
+                else:
+                    print(base_print)
             df_forecast = PredictWitch(
                 current_template,
                 df_train=df_train,
@@ -1507,6 +1471,7 @@ def RandomTemplate(
         'VECM',
         'DynamicFactor',
     ],
+    transformer_list: dict = {},
 ):
     """
     Returns a template dataframe of randomly generated transformations, models, and hyperparameters.
@@ -1520,7 +1485,7 @@ def RandomTemplate(
     while len(template.index) < n:
         model_str = np.random.choice(model_list)
         param_dict = ModelMonster(model_str).get_new_params()
-        trans_dict = RandomTransform()
+        trans_dict = RandomTransform(transformer_list=transformer_list)
         row = pd.DataFrame(
             {
                 'Model': model_str,
@@ -1601,7 +1566,7 @@ def trans_dict_recomb(dict_array):
     return c
 
 
-def _trans_dicts(current_ops, best=None, n: int = 5):
+def _trans_dicts(current_ops, best=None, n: int = 5, transformer_list: dict = {}):
     fir = json.loads(current_ops.iloc[0, :]['TransformationParameters'])
     cur_len = current_ops.shape[0]
     if cur_len > 1:
@@ -1610,10 +1575,10 @@ def _trans_dicts(current_ops, best=None, n: int = 5):
         r_id = np.random.randint(1, top_r)
         sec = json.loads(current_ops.iloc[r_id, :]['TransformationParameters'])
     else:
-        sec = RandomTransform()
-    r = RandomTransform()
+        sec = RandomTransform(transformer_list=transformer_list)
+    r = RandomTransform(transformer_list=transformer_list)
     if best is None:
-        best = RandomTransform()
+        best = RandomTransform(transformer_list=transformer_list)
     arr = [fir, sec, best, r]
     trans_dicts = [json.dumps(trans_dict_recomb(arr)) for _ in range(n)]
     return trans_dicts
@@ -1633,6 +1598,7 @@ def NewGeneticTemplate(
         'TransformationParameters',
         'Ensemble',
     ],
+    transformer_list: dict = {},
 ):
     """
     Return new template given old template with model accuracies.
@@ -1686,7 +1652,9 @@ def NewGeneticTemplate(
         if model_type in no_params:
             current_ops = sorted_results[sorted_results['Model'] == model_type]
             n = 3
-            trans_dicts = _trans_dicts(current_ops, best=best, n=n)
+            trans_dicts = _trans_dicts(
+                current_ops, best=best, n=n, transformer_list=transformer_list
+            )
             model_param = current_ops.iloc[0, :]['ModelParameters']
             new_row = pd.DataFrame(
                 {
@@ -1700,7 +1668,9 @@ def NewGeneticTemplate(
         elif model_type in recombination_approved:
             current_ops = sorted_results[sorted_results['Model'] == model_type]
             n = 4
-            trans_dicts = _trans_dicts(current_ops, best=best, n=n)
+            trans_dicts = _trans_dicts(
+                current_ops, best=best, n=n, transformer_list=transformer_list
+            )
             # select the best model of this type
             fir = json.loads(current_ops.iloc[0, :]['ModelParameters'])
             cur_len = current_ops.shape[0]
@@ -1735,7 +1705,9 @@ def NewGeneticTemplate(
         else:
             current_ops = sorted_results[sorted_results['Model'] == model_type]
             n = 3
-            trans_dicts = _trans_dicts(current_ops, best=best, n=n)
+            trans_dicts = _trans_dicts(
+                current_ops, best=best, n=n, transformer_list=transformer_list
+            )
             model_dicts = list()
             for _ in range(n):
                 c = ModelMonster(model_type).get_new_params()