Merge pull request #71 from winedarksea/dev

0.3.2
winedarksea · Jul 1, 2021 · baed432 · baed432
2 parents ec48749 + a10ade6
commit baed432
Show file tree

Hide file tree

Showing 59 changed files with 1,293 additions and 333 deletions.
diff --git a/.github/workflows/codeql-analysis.yml b/.github/workflows/codeql-analysis.yml
@@ -1,38 +1,51 @@
-name: "Code scanning - action"
+# For most projects, this workflow file will not need changing; you simply need
+# to commit it to your repository.
+#
+# You may wish to alter this file to override the set of languages analyzed,
+# or to provide custom queries or build logic.
+#
+# ******** NOTE ********
+# We have attempted to detect the languages in your repository. Please check
+# the `language` matrix defined below to confirm you have the correct set of
+# supported CodeQL languages.
+#
+name: "CodeQL"
 
 on:
   push:
-    branches: [master, ]
+    branches: [master]
   pull_request:
     # The branches below must be a subset of the branches above
-    branches: [master]
+    branches: [master, dev]
   schedule:
-    - cron: '0 6 * * 1'
+    - cron: '23 1 * * 4'
 
 jobs:
-  CodeQL-Build:
-
+  analyze:
+    name: Analyze
     runs-on: ubuntu-latest
 
+    strategy:
+      fail-fast: false
+      matrix:
+        language: [ 'python' ]
+        # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python' ]
+        # Learn more:
+        # https://docs.github.com/en/free-pro-team@latest/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#changing-the-languages-that-are-analyzed
+
     steps:
     - name: Checkout repository
       uses: actions/checkout@v2
-      with:
-        # We must fetch at least the immediate parents so that if this is
-        # a pull request then we can checkout the head.
-        fetch-depth: 2
-
-    # If this run was triggered by a pull request event, then checkout
-    # the head of the pull request instead of the merge commit.
-    - run: git checkout HEAD^2
-      if: ${{ github.event_name == 'pull_request' }}
 
     # Initializes the CodeQL tools for scanning.
     - name: Initialize CodeQL
       uses: github/codeql-action/init@v1
-      # Override language selection by uncommenting this and choosing your languages
-      # with:
-      #   languages: go, javascript, csharp, python, cpp, java
+      with:
+        languages: ${{ matrix.language }}
+        # If you wish to specify custom queries, you can do so here or in a config file.
+        # By default, queries listed here will override any specified in a config file.
+        # Prefix the list here with "+" to use these queries and those in the config file.
+        # queries: ./path/to/local/query, your-org/your-repo/queries@main
 
     # Autobuild attempts to build any compiled languages  (C/C++, C#, or Java).
     # If this step fails, then you should remove it and run the build manually (see below)

diff --git a/README.md b/README.md
@@ -8,6 +8,14 @@ AutoML for forecasting with open-source time series implementations.
 
 For other time series needs, check out the list [here](https://github.com/MaxBenChrist/awesome_time_series_in_python).
 
+## Table of Contents
+* [Features](https://github.com/winedarksea/AutoTS#features)
+* [Installation](https://github.com/winedarksea/AutoTS#installation)
+* [Basic Use](https://github.com/winedarksea/AutoTS#basic-use)
+* [Tips for Speed and Large Data](https://github.com/winedarksea/AutoTS#tips-for-speed-and-large-data)
+* Extended Tutorial [GitHub](https://github.com/winedarksea/AutoTS/blob/master/extended_tutorial.md) or [Docs](https://winedarksea.github.io/AutoTS/build/html/source/tutorial.html)
+* [Production Example](https://github.com/winedarksea/AutoTS/blob/master/production_example.py)
+
 ## Features
 * Finds optimal time series forecasting model and data transformations by genetic programming optimization
 * Handles univariate and multivariate/parallel time series
@@ -31,7 +39,7 @@ For other time series needs, check out the list [here](https://github.com/MaxBen
 ```
 pip install autots
 ```
-This includes dependencies for basic models, but additonal packages are required for some models and methods.
+This includes dependencies for basic models, but [additonal packages](https://github.com/winedarksea/AutoTS/blob/master/extended_tutorial.md#installation-and-dependency-versioning) are required for some models and methods.
 
 ## Basic Use
 
@@ -91,11 +99,13 @@ The lower-level API, in particular the large section of time series transformers
 
 Check out [extended_tutorial.md](https://winedarksea.github.io/AutoTS/build/html/source/tutorial.html) for a more detailed guide to features!
 
+Also take a look at the [production_example.py](https://github.com/winedarksea/AutoTS/blob/master/production_example.py)
+
 
 ## Tips for Speed and Large Data:
 * Use appropriate model lists, especially the predefined lists:
 	* `superfast` (simple naive models) and `fast` (more complex but still faster models)
-	* `fast_parallel` (a combination of `fast` and `parallel`) or `parallel`, given mave many CPU cores are available
+	* `fast_parallel` (a combination of `fast` and `parallel`) or `parallel`, given many CPU cores are available
 		* `n_jobs` usually gets pretty close with `='auto'` but adjust as necessary for the environment
 	* see a dict of predefined lists (some defined for internal use) with `from autots.models.model_list import model_lists`
 * Use the `subset` parameter when there are many similar series, `subset=100` will often generalize well for tens of thousands of similar series.

diff --git a/TODO.md b/TODO.md
@@ -15,18 +15,16 @@
 * Forecasts are desired for the future immediately following the most recent data.
 
 # Latest
-* Additional models to GluonTS
-* GeneralTransformer transformation_params - now handle None or empty dict
-* cleaning up of the appropriately named 'ModelMonster'
-* improving MotifSimulation
-* better error message for all models
-* enable histgradientboost regressor, left it out before thinking it wouldn't stay experimental this long
-* import_template now has slightly better `method` input style
-* allow `ensemble` parameter to be a list
-* NumericTransformer
-	* add .fit_transform method
-	* generally more options and speed improvement
-* added NumericTransformer to future_regressors, should now coerce if they have different dtypes
+* Table of Contents to Extended Tutorial/Readme.md
+* Production Example
+* add weights="mean"/median/min/max
+* UnivariateRegression
+* fix check_pickle error for ETS
+* fix error in Prophet with latest version
+* VisibleDeprecation warning for hidden_layers random choice in sklearn fixed
+* prefill_na option added to allow quick filling of NaNs if desired (with zeroes for say, sales forecasting)
+* made horizontal generalization more stable
+* fixed bug in VAR where failing on data with negatives
 
 # Known Errors: 
 DynamicFactor holidays 	Exceptions 'numpy.ndarray' object has no attribute 'values'

diff --git a/autots/__init__.py b/autots/__init__.py
@@ -16,7 +16,7 @@
 from autots.tools.transform import GeneralTransformer, RandomTransform
 from autots.tools.shaping import long_to_wide
 
-__version__ = '0.3.1'
+__version__ = '0.3.2'
 
 TransformTS = GeneralTransformer
 

diff --git a/autots/datasets/fred.py b/autots/datasets/fred.py
@@ -14,23 +14,23 @@
     _has_fred = True
 
 
-def get_fred_data(fredkey: str, SeriesNameDict: dict = {'SeriesID': 'SeriesName'}):
-    """
-    Imports Data from Federal Reserve
+def get_fred_data(fredkey: str, SeriesNameDict: dict = None, long=True, **kwargs):
+    """Imports Data from Federal Reserve.
+    For simplest results, make sure requested series are all of the same frequency.
 
     args:
-        fredkey - an API key from FRED
-
-        SeriesNameDict, pairs of FRED Series IDs and Series Names
+        fredkey (str): an API key from FRED
+        SeriesNameDict (dict): pairs of FRED Series IDs and Series Names like: {'SeriesID': 'SeriesName'} or a list of FRED IDs.
             Series id must match Fred IDs, but name can be anything
-            if default is use, several default samples are returned
+            if None, several default series are returned
+        long (bool): if True, return long style data, else return wide style data with dt index
     """
     if not _has_fred:
         raise ImportError("Package fredapi is required")
 
     fred = Fred(api_key=fredkey)
 
-    if SeriesNameDict == {'SeriesID': 'SeriesName'}:
+    if SeriesNameDict is None:
         SeriesNameDict = {
             'T10Y2Y': '10 Year Treasury Constant Maturity Minus 2 Year Treasury Constant Maturity',
             'DGS10': '10 Year Treasury Constant Maturity Rate',
@@ -44,29 +44,42 @@ def get_fred_data(fredkey: str, SeriesNameDict: dict = {'SeriesID': 'SeriesName'
             'USEPUINDXD': 'Economic Policy Uncertainty Index for United States',  # also very irregular
         }
 
-    series_desired = list(SeriesNameDict.keys())
+    if isinstance(SeriesNameDict, dict):
+        series_desired = list(SeriesNameDict.keys())
+    else:
+        series_desired = list(SeriesNameDict)
 
-    fred_timeseries = pd.DataFrame(
-        columns=['date', 'value', 'series_id', 'series_name']
-    )
+    if long:
+        fred_timeseries = pd.DataFrame(
+            columns=['date', 'value', 'series_id', 'series_name']
+        )
+    else:
+        fred_timeseries = pd.DataFrame()
 
     for series in series_desired:
         data = fred.get_series(series)
         try:
             series_name = SeriesNameDict[series]
         except Exception:
             series_name = series
-        data_df = pd.DataFrame(
-            {
-                'date': data.index,
-                'value': data,
-                'series_id': series,
-                'series_name': series_name,
-            }
-        )
-        data_df.reset_index(drop=True, inplace=True)
-        fred_timeseries = pd.concat(
-            [fred_timeseries, data_df], axis=0, ignore_index=True
-        )
+
+        if long:
+            data_df = pd.DataFrame(
+                {
+                    'date': data.index,
+                    'value': data,
+                    'series_id': series,
+                    'series_name': series_name,
+                }
+            )
+            data_df.reset_index(drop=True, inplace=True)
+            fred_timeseries = pd.concat(
+                [fred_timeseries, data_df], axis=0, ignore_index=True
+            )
+        else:
+            data.name = series_name
+            fred_timeseries = fred_timeseries.merge(
+                data, how="outer", left_index=True, right_index=True
+            )
 
     return fred_timeseries
diff --git a/autots/evaluator/auto_model.py b/autots/evaluator/auto_model.py
@@ -8,7 +8,12 @@
 from autots.evaluator.metrics import PredictionEval
 from autots.tools.transform import RandomTransform, GeneralTransformer, shared_trans
 from autots.models.ensemble import EnsembleForecast, generalize_horizontal
-from autots.models.model_list import no_params, recombination_approved, no_shared
+from autots.models.model_list import (
+    no_params,
+    recombination_approved,
+    no_shared,
+    superfast,
+)
 from itertools import zip_longest
 from autots.models.basics import (
     MotifSimulation,
@@ -146,6 +151,20 @@ def ModelMonster(
             **parameters,
         )
         return model
+    elif model == 'UnivariateRegression':
+        from autots.models.sklearn import UnivariateRegression
+
+        model = UnivariateRegression(
+            frequency=frequency,
+            prediction_interval=prediction_interval,
+            holiday_country=holiday_country,
+            random_seed=random_seed,
+            verbose=verbose,
+            n_jobs=n_jobs,
+            forecast_length=forecast_length,
+            **parameters,
+        )
+        return model
 
     elif model == 'UnobservedComponents':
         model = UnobservedComponents(
@@ -658,6 +677,7 @@ def PredictWitch(
     if isinstance(template, pd.Series):
         template = pd.DataFrame(template).transpose()
     template = template.head(1)
+    full_model_created = False  # make at least one full model, horziontal only
     for index_upper, row_upper in template.iterrows():
         # if an ensemble
         if row_upper['Model'] == 'Ensemble':
@@ -750,18 +770,25 @@ def PredictWitch(
             model_str = row_upper['Model']
             parameter_dict = json.loads(row_upper['ModelParameters'])
             transformation_dict = json.loads(row_upper['TransformationParameters'])
+            # this is needed for horizontal generalization if any models failed, at least one full model on all series
+            if model_str in superfast and not full_model_created:
+                make_full_flag = True
+            else:
+                make_full_flag = False
             if (
                 horizontal_subset is not None
                 and model_str in no_shared
                 and all(
                     trs not in shared_trans
                     for trs in list(transformation_dict['transformations'].values())
                 )
+                and not make_full_flag
             ):
                 df_train_low = df_train.reindex(copy=True, columns=horizontal_subset)
                 # print(f"Reducing to subset for {model_str} with {df_train_low.columns}")
             else:
                 df_train_low = df_train.copy()
+                full_model_created = True
 
             df_forecast = ModelPrediction(
                 df_train_low,
@@ -816,6 +843,7 @@ def TemplateWizard(
         'TransformationParameters',
         'Ensemble',
     ],
+    traceback: bool = False,
 ):
     """
     Take Template, returns Results.
@@ -844,6 +872,7 @@ def TemplateWizard(
         max_generations (int): info to pass to print statements
         model_interrupt (bool): if True, keyboard interrupts are caught and only break current model eval.
         template_cols (list): column names of columns used as model template
+        traceback (bool): include tracebook over just error representation
 
     Returns:
         TemplateEvalObject
@@ -1030,11 +1059,23 @@ def TemplateWizard(
                 raise KeyboardInterrupt
         except Exception as e:
             if verbose >= 0:
-                print(
-                    'Template Eval Error: {} in model {}: {}'.format(
-                        (repr(e)), template_result.model_count, model_str
+                if traceback:
+                    import traceback as tb
+
+                    print(
+                        'Template Eval Error: {} in model {}: {}'.format(
+                            ''.join(tb.format_exception(None, e, e.__traceback__)),
+                            template_result.model_count,
+                            model_str,
+                        )
                     )
-                )
+                else:
+                    print(
+                        'Template Eval Error: {} in model {}: {}'.format(
+                            (repr(e)), template_result.model_count, model_str
+                        )
+                    )
+
             result = pd.DataFrame(
                 {
                     'ID': create_model_id(