Skip to content

Commit

Permalink
0.2.8 (#29)
Browse files Browse the repository at this point in the history
* start

* Transformer upgrades continue

* transformer none breaking updates

* lots and lots of things

* take the keys and see what happens!

* Update TODO.md

* black files

* Update auto_ts.py

* docs build part 1

* docs part ii
  • Loading branch information
winedarksea authored Dec 13, 2020
1 parent 87e75d0 commit ea82833
Show file tree
Hide file tree
Showing 51 changed files with 690 additions and 1,219 deletions.
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,28 @@

<img src="/img/autots_logo.png" width="400" height="184" title="AutoTS Logo">

**Model Selection for Multiple Time Series**
**Forecasting Model Selection for Multiple Time Series**

Simple package for comparing and predicting with open-source time series implementations.
AutoML for forecasting with open-source time series implementations.

For other time series needs, check out the list [here](https://github.com/MaxBenChrist/awesome_time_series_in_python).

## Features
* Twenty available model classes, with tens of thousands of possible hyperparameter configurations
* Finds optimal time series models by genetic programming
* Finds optimal time series forecasting model and data transformations by genetic programming optimization
* Handles univariate and multivariate/parallel time series
* Point and probabilistic forecasts
* Ability to handle messy data by learning optimal NaN imputation and outlier removal
* Ability to add external known-in-advance regressor
* Point and probabilistic upper/lower bound forecasts for all models
* Twenty-two available model classes, with tens of thousands of possible hyperparameter configurations
* Includes naive, statistical, machine learning, and deep learning models
* Multiprocessing for univariate models for scalability on multivariate datasets
* Ability to add external regressors
* Over thirty time series specific data transformations
* Ability to handle messy data by learning optimal NaN imputation and outlier removal
* Allows automatic ensembling of best models
* 'horizontal' ensembling on multivariate series - learning the best model for each series
* Multiple cross validation options
* Subsetting and weighting to improve search on many multivariate series
* Option to use one or a combination of metrics for model selection
* Import and export of templates allowing greater user customization
* Import and export of model templates for deployment and greater user customization

## Installation
```
Expand Down
20 changes: 12 additions & 8 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,11 @@
* Forecasts are desired for the future immediately following the most recent data.

# Latest
* 2x speedup in transformation runtime by removing double transformation
* joblib parallel to UnobservedComponents
* ClipOutliers transformer, Discretize Transformer, CenterLastValue - added in prep for transform template change
* bug fix on IntermittentOccurence
* minor changes to ETS, now replaces single series failure with zero fill, damped now is damped_trend
* 0.3.0 is expected to feature a breaking change to model templates in the transformation/pre-processing
* Round transformer to replace coerce_integer, ClipOutliers expanded, Slice to replace context_slicer
* pd.df Interpolate methods added to FillNA options, " " to "_" in names, rolling_mean_24
* slight improvement to printed progress messages
* transformer_list (also takes a dict of value:probability) allows adjusting which transformers are created in new generations.
* this does not apply to transformers loaded from imported templates

# Known Errors:
DynamicFactor holidays Exceptions 'numpy.ndarray' object has no attribute 'values'
Expand Down Expand Up @@ -211,12 +210,10 @@ Tensorflow GPU backend may crash on occasion.


#### New Transformations:
Sklearn iterative imputer
lag and beta to DifferencedTransformer to make it more of an AR process
Weighted moving average
Symbolic aggregate approximation (SAX) and (PAA) (basically these are just binning)
Shared discretization (all series get same shared binning)
Last Value Centering
More sophisticated fillna methods
Constraint as a transformation parameter

Expand All @@ -226,3 +223,10 @@ Tensorflow GPU backend may crash on occasion.
* add to recombination_approved if so, in auto_model.py
* add to no_shared if so, in auto_model.py
* add to model table in extended_tutorial.md

## New Transformer Checklist:
* Make sure that if it modifies the size (more/fewer columns or rows) it returns pd.DataFrame with proper index/columns
* depth of recombination is?
* add to "all" transformer dict
* add to no_shared if so, in auto_model.py
* oddities_list for those with forecast/original transform difference
14 changes: 8 additions & 6 deletions autots/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,19 @@
https://github.com/winedarksea/AutoTS
"""
from autots.datasets import load_hourly
from autots.datasets import load_daily
from autots.datasets import load_monthly
from autots.datasets import load_yearly
from autots.datasets import load_weekly
from autots.datasets import (
load_hourly,
load_daily,
load_monthly,
load_yearly,
load_weekly,
)

from autots.evaluator.auto_ts import AutoTS
from autots.tools.transform import GeneralTransformer
from autots.tools.shaping import long_to_wide

__version__ = '0.2.7'
__version__ = '0.2.8'


__all__ = [
Expand Down
170 changes: 71 additions & 99 deletions autots/evaluator/auto_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,88 +10,32 @@

def seasonal_int(include_one: bool = False):
"""Generate a random integer of typical seasonalities."""
if include_one:
lag = np.random.choice(
a=[
'random_int',
1,
2,
4,
7,
10,
12,
24,
28,
60,
96,
168,
364,
1440,
420,
52,
84,
],
size=1,
p=[
0.10,
0.05,
0.05,
0.05,
0.15,
0.01,
0.1,
0.1,
0.1,
0.1,
0.04,
0.01,
0.1,
0.01,
0.01,
0.01,
0.01,
],
).item()
else:
lag = np.random.choice(
a=[
'random_int',
2,
4,
7,
10,
12,
24,
28,
60,
96,
168,
364,
1440,
420,
52,
84,
],
size=1,
p=[
0.15,
0.05,
0.05,
0.15,
0.01,
0.1,
0.1,
0.1,
0.1,
0.04,
0.01,
0.1,
0.01,
0.01,
0.01,
0.01,
],
).item()
prob_dict = {
'random_int': 0.1,
1: 0.05,
2: 0.05,
4: 0.05,
7: 0.15,
10: 0.01,
12: 0.1,
24: 0.1,
28: 0.1,
60: 0.1,
96: 0.04,
168: 0.01,
364: 0.1,
1440: 0.01,
420: 0.01,
52: 0.01,
84: 0.01,
}
lag = np.random.choice(
a=list(prob_dict.keys()),
p=list(prob_dict.values()),
size=1,
).item()
if not include_one and str(lag) == '1':
lag = 'random_int'
if lag == 'random_int':
lag = np.random.randint(2, 100, size=1).item()
return int(lag)
Expand Down Expand Up @@ -1018,7 +962,13 @@ def unpack_ensemble_models(
keep_ensemble: bool = True,
recursive: bool = False,
):
"""Take ensemble models from template and add as new rows."""
"""Take ensemble models from template and add as new rows.
Args:
template (pd.DataFrame): AutoTS template containing template_cols
keep_ensemble (bool): if False, drop row containing original ensemble
recursive (bool): if True, unnest ensembles of ensembles...
"""
ensemble_template = pd.DataFrame()
template['Ensemble'] = np.where(
((template['Model'] == 'Ensemble') & (template['Ensemble'] < 1)),
Expand Down Expand Up @@ -1245,6 +1195,8 @@ def TemplateWizard(
verbose: int = 0,
n_jobs: int = None,
validation_round: int = 0,
current_generation: int = 0,
max_generations: int = 0,
model_interrupt: bool = False,
grouping_ids=None,
template_cols: list = [
Expand Down Expand Up @@ -1277,6 +1229,8 @@ def TemplateWizard(
holiday_country (str): passed through to holiday package, used by a few models as 0/1 regressor.
startTimeStamps (pd.Series): index (series_ids), columns (Datetime of First start of series)
validation_round (int): int passed to record current validation.
current_generation (int): info to pass to print statements
max_generations (int): info to pass to print statements
model_interrupt (bool): if True, keyboard interrupts are caught and only break current model eval.
template_cols (list): column names of columns used as model template
Expand All @@ -1299,24 +1253,34 @@ def TemplateWizard(
current_template = pd.DataFrame(row).transpose()
template_result.model_count += 1
if verbose > 0:
if verbose > 1:
print(
"Model Number: {} with model {} in Validation {} with params {} and transformations {}".format(
if validation_round >= 1:
base_print = (
"Model Number: {} of {} with model {} for Validation {}".format(
str(template_result.model_count),
template.shape[0],
model_str,
str(validation_round),
json.dumps(parameter_dict),
json.dumps(transformation_dict),
)
)
else:
print(
"Model Number: {} with model {} in Validation {} ".format(
base_print = (
"Model Number: {} with model {} in generation {} of {}".format(
str(template_result.model_count),
model_str,
str(validation_round),
str(current_generation),
str(max_generations),
)
)
if verbose > 1:
print(
base_print
+ " with params {} and transformations {}".format(
json.dumps(parameter_dict),
json.dumps(transformation_dict),
)
)
else:
print(base_print)
df_forecast = PredictWitch(
current_template,
df_train=df_train,
Expand Down Expand Up @@ -1507,6 +1471,7 @@ def RandomTemplate(
'VECM',
'DynamicFactor',
],
transformer_list: dict = {},
):
"""
Returns a template dataframe of randomly generated transformations, models, and hyperparameters.
Expand All @@ -1520,7 +1485,7 @@ def RandomTemplate(
while len(template.index) < n:
model_str = np.random.choice(model_list)
param_dict = ModelMonster(model_str).get_new_params()
trans_dict = RandomTransform()
trans_dict = RandomTransform(transformer_list=transformer_list)
row = pd.DataFrame(
{
'Model': model_str,
Expand Down Expand Up @@ -1601,7 +1566,7 @@ def trans_dict_recomb(dict_array):
return c


def _trans_dicts(current_ops, best=None, n: int = 5):
def _trans_dicts(current_ops, best=None, n: int = 5, transformer_list: dict = {}):
fir = json.loads(current_ops.iloc[0, :]['TransformationParameters'])
cur_len = current_ops.shape[0]
if cur_len > 1:
Expand All @@ -1610,10 +1575,10 @@ def _trans_dicts(current_ops, best=None, n: int = 5):
r_id = np.random.randint(1, top_r)
sec = json.loads(current_ops.iloc[r_id, :]['TransformationParameters'])
else:
sec = RandomTransform()
r = RandomTransform()
sec = RandomTransform(transformer_list=transformer_list)
r = RandomTransform(transformer_list=transformer_list)
if best is None:
best = RandomTransform()
best = RandomTransform(transformer_list=transformer_list)
arr = [fir, sec, best, r]
trans_dicts = [json.dumps(trans_dict_recomb(arr)) for _ in range(n)]
return trans_dicts
Expand All @@ -1633,6 +1598,7 @@ def NewGeneticTemplate(
'TransformationParameters',
'Ensemble',
],
transformer_list: dict = {},
):
"""
Return new template given old template with model accuracies.
Expand Down Expand Up @@ -1686,7 +1652,9 @@ def NewGeneticTemplate(
if model_type in no_params:
current_ops = sorted_results[sorted_results['Model'] == model_type]
n = 3
trans_dicts = _trans_dicts(current_ops, best=best, n=n)
trans_dicts = _trans_dicts(
current_ops, best=best, n=n, transformer_list=transformer_list
)
model_param = current_ops.iloc[0, :]['ModelParameters']
new_row = pd.DataFrame(
{
Expand All @@ -1700,7 +1668,9 @@ def NewGeneticTemplate(
elif model_type in recombination_approved:
current_ops = sorted_results[sorted_results['Model'] == model_type]
n = 4
trans_dicts = _trans_dicts(current_ops, best=best, n=n)
trans_dicts = _trans_dicts(
current_ops, best=best, n=n, transformer_list=transformer_list
)
# select the best model of this type
fir = json.loads(current_ops.iloc[0, :]['ModelParameters'])
cur_len = current_ops.shape[0]
Expand Down Expand Up @@ -1735,7 +1705,9 @@ def NewGeneticTemplate(
else:
current_ops = sorted_results[sorted_results['Model'] == model_type]
n = 3
trans_dicts = _trans_dicts(current_ops, best=best, n=n)
trans_dicts = _trans_dicts(
current_ops, best=best, n=n, transformer_list=transformer_list
)
model_dicts = list()
for _ in range(n):
c = ModelMonster(model_type).get_new_params()
Expand Down
Loading

0 comments on commit ea82833

Please sign in to comment.