Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No seasonal terms are included with seasonality options set to 'auto' with monthly data #77

Open
2 tasks
dromare opened this issue Jun 28, 2022 · 0 comments

Comments

@dromare
Copy link
Contributor

dromare commented Jun 28, 2022

Greykite documentation states that the seasonality "auto" option is meant to let the template decide, based on input data frequency and the amount of training data, whether to model that seasonality with default Fourier order:
https://linkedin.github.io/greykite/docs/0.1.0/html/pages/model_components/0300_seasonality.html?highlight=seasonality

However, with monthly data, this option always defaults to False, both for QUARTERLY_SEASONALITY and YEARLY_SEASONALITY, even when the amount of training data (num_training_days) is greater than the minimum required (default_min_days). Why ? Read below.

These are the Silverkite default settings for minimum training data requirements, as defined in \greykite\algo\forecast\silverkite\constants\silverkite_seasonality.py

SilverkiteSeasonality(name='ct1', period=1.0, order=15, seas_names='yearly', default_min_days=548)
SilverkiteSeasonality(name='toq', period=1.0, order=5, seas_names='quarterly', default_min_days=180)

num_training_days is calculated in \greykite\common\time_properties_forecast.py, whereas the actual test is in \greykite\algo\forecast\silverkite\forecast_simple_silverkite.py(here, num_days is num_training_days calculated above):

num_days >= seas.value.default_min_days
                    and seas.name in freq_auto_seas_names

The result of the test is always False for monthly data, because freq_auto_seas_names is an empty dictionary, hence the condition seas.name in freq_auto_seas_names is never met ; the reason can be clearly seen in \greykite\algo\forecast\silverkite\constants\silverkite_time_frequency.py, where, e.g., for weekly data freq_auto_seas_names is the following dictionary:

auto_fourier_seas={SeasonalityEnum.MONTHLY_SEASONALITY.name,
                           SeasonalityEnum.QUARTERLY_SEASONALITY.name,
                           SeasonalityEnum.YEARLY_SEASONALITY.name})

whereas for monthly, quarterly and yearly data freq_auto_seas_names = {}, e.g. for monthly data:

auto_fourier_seas={
            # QUARTERLY_SEASONALITY and YEARLY_SEASONALITY are excluded from defaults
            # It's better to use `C(month)` as a categorical feature indicating the month
        })

Therefore, based on input data frequency in the first line of this issue really means: if the data frequency is one of MINUTE, HOUR, DAY, WEEK, excluding MONTH, QUARTER, YEAR, MULTIYEAR.

The "better" option in \greykite\algo\forecast\silverkite\constants\silverkite_time_frequency.py when using monthly data is thus to add an extra C(month) column as a categorical feature indicating the month.

Question: Why is this a "better" option than the following definition ?

auto_fourier_seas={SeasonalityEnum.QUARTERLY_SEASONALITY.name,
                           SeasonalityEnum.YEARLY_SEASONALITY.name})

I see the following alternatives when dealing with monthly data:

  1. add an extra C(month) column as a categorical feature indicating the month; this has the disadvantage that the extra column should only be added when both QUARTERLY_SEASONALITY and YEARLY_SEASONALITY options are set to "auto" and not to "True" or "False" (quarterly and/or yearly seasonality terms are added automatically by Greykite when the respective option is set to "True", according to the valid_seas dictionary defined in _\greykite\common\enums.py; while the term in question is not added when "False")
  2. Add QUARTERLY_SEASONALITY and YEARLY_SEASONALITY terms (currently excluded from defaults) to the empty auto_fouries_seas dictionary; but Greykite developers seem to prefer option 1.
  3. Forget about the user setting the seasonality options ("auto", "True", "False") manually - this is applicable to all input data frequencies, not just monthly:
  • Let the user configure the Fourier order and the minimum number of cycles for each seasonality
  • Set the corresponding seasonality option to either "True" or "False" automatically, according to principles learned from the current logic, i.e., input data frequency, valid_seas and num_training_points >= default_min_points

One may argue that num_training_points varies between training sets when using CV splits; however, the following example shows that both num_training_points and num_training_days are invariant between splits, even with cv_expanding_window =True:

[CV 1/3] ... valid_seas={'YEARLY_SEASONALITY', 'QUARTERLY_SEASONALITY'})>, 'num_training_points': 26, 'num_training_days': 789.0, 'days_per_observation': 28.0, ...
[CV 2/3] ... valid_seas={'YEARLY_SEASONALITY', 'QUARTERLY_SEASONALITY'})>, 'num_training_points': 26, 'num_training_days': 789.0, 'days_per_observation': 28.0, ...
[CV 3/3] ... valid_seas={'YEARLY_SEASONALITY', 'QUARTERLY_SEASONALITY'})>, 'num_training_points': 26, 'num_training_days': 789.0, 'days_per_observation': 28.0, ...

This means that the test num_training_points >= default_min_points can be applied only once directly from train_end_date before entering the CV loop (the current Fitting 3 folds for each of 1 candidates, totalling 3 fits section apparently tests the seasonality terms at each split, but the test values are invariant, as mentioned above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant