Skip to content

Commit

Permalink
Merge pull request #58 from winedarksea/dev
Browse files Browse the repository at this point in the history
0.3.1
  • Loading branch information
winedarksea authored Mar 24, 2021
2 parents 360775e + 14e7140 commit ec48749
Show file tree
Hide file tree
Showing 56 changed files with 1,060 additions and 649 deletions.
27 changes: 25 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ For other time series needs, check out the list [here](https://github.com/MaxBen
* Allows automatic ensembling of best models
* 'horizontal' ensembling on multivariate series - learning the best model for each series
* Multiple cross validation options
* 'seasonal' validation allows forecasts to be optimized for the season of your forecast period
* 'seasonal' validation allows forecasts to be optimized for the seasonity of the data
* Subsetting and weighting to improve speed and relevance of search on large datasets
* 'constraint' parameter can be used to assure forecasts don't drift beyond historic boundaries
* Option to use one or a combination of metrics for model selection
Expand Down Expand Up @@ -72,12 +72,15 @@ model = model.fit(
id_col='series_id' if long else None,
)

prediction = model.predict()
# Print the details of the best model
print(model)

prediction = model.predict()
# point forecasts dataframe
forecasts_df = prediction.forecast
# upper and lower forecasts
forecasts_up, forecasts_low = prediction.upper_forecast, prediction.lower_forecast

# accuracy of all tried model results
model_results = model.results()
# and aggregated from cross validation
Expand All @@ -88,6 +91,26 @@ The lower-level API, in particular the large section of time series transformers

Check out [extended_tutorial.md](https://winedarksea.github.io/AutoTS/build/html/source/tutorial.html) for a more detailed guide to features!


## Tips for Speed and Large Data:
* Use appropriate model lists, especially the predefined lists:
* `superfast` (simple naive models) and `fast` (more complex but still faster models)
* `fast_parallel` (a combination of `fast` and `parallel`) or `parallel`, given mave many CPU cores are available
* `n_jobs` usually gets pretty close with `='auto'` but adjust as necessary for the environment
* see a dict of predefined lists (some defined for internal use) with `from autots.models.model_list import model_lists`
* Use the `subset` parameter when there are many similar series, `subset=100` will often generalize well for tens of thousands of similar series.
* if using `subset`, passing `weights` for series will weight subset selection towards higher priority series.
* if limited by RAM, it can be easily distributed by running multiple instances of AutoTS on different batches of data, having first imported a template pretrained as a starting point for all.
* Set `model_interrupt=True` which passes over the current model when a `KeyboardInterrupt` ie `crtl+c` is pressed (although if the interrupt falls between generations it will stop the entire training).
* Use the `result_file` method of `.fit()` which will save progress after each generation - helpful to save progress if a long training is being done. Use `import_results` to recover.
* While Transformations are pretty fast, setting `transformer_max_depth` to a lower number (say, 2) will increase speed. Also utilize `transformer_list`.
* Ensembles are obviously slower to predict because they run many models, 'distance' models 2x slower, and 'simple' models 3x-5x slower.
* `ensemble='horizontal-max'` with `model_list='no_shared_fast'` can scale relatively well given many cpu cores because each model is only run on the series it is needed for.
* Reducing `num_validations` and `models_to_validate` will decrease runtime but may lead to poorer model selections.
* For datasets with many records, upsampling (for example, from daily to monthly frequency forecasts) can reduce training time if appropriate.
* this can be done by adjusting `frequency` and `aggfunc` but is probably best done before passing data into AutoTS.


## How to Contribute:
* Give feedback on where you find the documentation confusing
* Use AutoTS and...
Expand Down
35 changes: 13 additions & 22 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,23 +15,18 @@
* Forecasts are desired for the future immediately following the most recent data.

# Latest
* **breaking change** to model templates: transformers structure change
* grouping no longer used
* parameter generation for transformers allowing more possible combinations
* transformer_max_depth parameter
* Horizontal Ensembles are now much faster by only running models on the subset of series they apply to
* general starting template improved and updated to new transformer format
* change many np.random to random
* random.choices further necessitates python 3.6 or greater
* bug fix in Detrend transformer
* bug fix in SeasonalDifference transformer
* SPL bug fix when NaN in test set
* inverse_transform now fills NaN with zero for upper/lower forecasts
* expanded model_list aliases, with dedicated module
* bug fix (creating 0,0 order) and tuning of VARMAX
* Fix export_template bug
* restructuring of some lower-level function locations

* Additional models to GluonTS
* GeneralTransformer transformation_params - now handle None or empty dict
* cleaning up of the appropriately named 'ModelMonster'
* improving MotifSimulation
* better error message for all models
* enable histgradientboost regressor, left it out before thinking it wouldn't stay experimental this long
* import_template now has slightly better `method` input style
* allow `ensemble` parameter to be a list
* NumericTransformer
* add .fit_transform method
* generally more options and speed improvement
* added NumericTransformer to future_regressors, should now coerce if they have different dtypes

# Known Errors:
DynamicFactor holidays Exceptions 'numpy.ndarray' object has no attribute 'values'
Expand Down Expand Up @@ -64,12 +59,8 @@ Tensorflow GPU backend may crash on occasion.
* Remove 'horizontal' sanity check run, takes too long (only if metric weights are x)?
* Horizontal and BestN runtime variant, where speed is highly important in model selection
* total runtime for .fit() as attribute (not just manual sum but capture in ModelPrediction)
* allow Index to be other datetime not just DatetimeIndex
* cleanse similar models out first, before horizontal ensembling
* BestNEnsemble Add 5 or more model option
* allow best_model to be specified and entirely bypass the .fit() stage.
* drop duplicates as function of TemplateEvalObject
* improve test.py script for actual testing of many features
* Convert 'Holiday' regressors into Datepart + Holiday 2d
* export and import of results includes all model parameters (but not templates?)
* Option to use full traceback in errors in table
Expand All @@ -94,7 +85,7 @@ Tensorflow GPU backend may crash on occasion.
* Probabilistic:
https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_quantile.html
* GluonTS
* Add support for future_regressor
* Add support for future_regressor (potentially PCA down to 1 feature, then use?)
* Modify GluonStart if lots of NaN at start of that series
* GPU and CPU ctx
* implement 'borrow' Genetic Recombination for ComponentAnalysis
Expand Down
4 changes: 3 additions & 1 deletion autots/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,14 @@
load_monthly,
load_yearly,
load_weekly,
load_weekdays,
)

from autots.evaluator.auto_ts import AutoTS
from autots.tools.transform import GeneralTransformer, RandomTransform
from autots.tools.shaping import long_to_wide

__version__ = '0.3.0'
__version__ = '0.3.1'

TransformTS = GeneralTransformer

Expand All @@ -25,6 +26,7 @@
'load_yearly',
'load_hourly',
'load_weekly',
'load_weekdays',
'AutoTS',
'TransformTS',
'GeneralTransformer',
Expand Down
10 changes: 9 additions & 1 deletion autots/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,13 @@
from autots.datasets._base import load_yearly
from autots.datasets._base import load_hourly
from autots.datasets._base import load_weekly
from autots.datasets._base import load_weekdays

__all__ = ['load_daily', 'load_monthly', 'load_yearly', 'load_hourly', 'load_weekly']
__all__ = [
'load_daily',
'load_monthly',
'load_yearly',
'load_hourly',
'load_weekly',
'load_weekdays',
]
32 changes: 32 additions & 0 deletions autots/datasets/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,3 +172,35 @@ def load_weekly(long: bool = True):
aggfunc='first',
)
return df_wide


def load_weekdays(long: bool = False, categorical: bool = True, periods: int = 180):
"""Test edge cases by creating a Series with values as day of week.
Args:
long (bool):
if True, return a df with columns "value" and "datetime"
if False, return a Series with dt index
categorical (bool): if True, return str/object, else return int
periods (int): number of periods, ie length of data to generate
"""
idx = pd.date_range(end=pd.Timestamp.today(), periods=periods, freq="D")
df_wide = pd.Series(idx.weekday, index=idx, name="value")
df_wide.index.name = "datetime"
if categorical:
df_wide = df_wide.replace(
{
0: "Mon",
1: "Tues",
2: "Wed",
3: "Thor's",
4: "Fri",
5: "Sat",
6: "Sun",
7: "Mon",
}
)
if long:
return df_wide.reset_index()
else:
return df_wide
Loading

0 comments on commit ec48749

Please sign in to comment.