You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, with monthly data, this option always defaults to False, both for QUARTERLY_SEASONALITY and YEARLY_SEASONALITY, even when the amount of training data (num_training_days) is greater than the minimum required (default_min_days). Why ? Read below.
These are the Silverkite default settings for minimum training data requirements, as defined in \greykite\algo\forecast\silverkite\constants\silverkite_seasonality.py
num_training_days is calculated in \greykite\common\time_properties_forecast.py, whereas the actual test is in \greykite\algo\forecast\silverkite\forecast_simple_silverkite.py(here, num_days is num_training_days calculated above):
num_days >= seas.value.default_min_days
and seas.name in freq_auto_seas_names
The result of the test is always False for monthly data, because freq_auto_seas_names is an empty dictionary, hence the condition seas.name in freq_auto_seas_names is never met ; the reason can be clearly seen in \greykite\algo\forecast\silverkite\constants\silverkite_time_frequency.py, where, e.g., for weekly data freq_auto_seas_names is the following dictionary:
whereas for monthly, quarterly and yearly data freq_auto_seas_names = {}, e.g. for monthly data:
auto_fourier_seas={
# QUARTERLY_SEASONALITY and YEARLY_SEASONALITY are excluded from defaults
# It's better to use `C(month)` as a categorical feature indicating the month
})
Therefore, based on input data frequency in the first line of this issue really means: if the data frequency is one of MINUTE, HOUR, DAY, WEEK, excluding MONTH, QUARTER, YEAR, MULTIYEAR.
The "better" option in \greykite\algo\forecast\silverkite\constants\silverkite_time_frequency.py when using monthly data is thus to add an extra C(month) column as a categorical feature indicating the month.
Question: Why is this a "better" option than the following definition ?
I see the following alternatives when dealing with monthly data:
add an extra C(month) column as a categorical feature indicating the month; this has the disadvantage that the extra column should only be added when both QUARTERLY_SEASONALITY and YEARLY_SEASONALITY options are set to "auto" and not to "True" or "False" (quarterly and/or yearly seasonality terms are added automatically by Greykite when the respective option is set to "True", according to the valid_seas dictionary defined in _\greykite\common\enums.py; while the term in question is not added when "False")
Add QUARTERLY_SEASONALITY and YEARLY_SEASONALITY terms (currently excluded from defaults) to the empty auto_fouries_seas dictionary; but Greykite developers seem to prefer option 1.
Forget about the user setting the seasonality options ("auto", "True", "False") manually - this is applicable to all input data frequencies, not just monthly:
Let the user configure the Fourier order and the minimum number of cycles for each seasonality
Set the corresponding seasonality option to either "True" or "False" automatically, according to principles learned from the current logic, i.e., input data frequency, valid_seas and num_training_points >= default_min_points
One may argue that num_training_points varies between training sets when using CV splits; however, the following example shows that both num_training_points and num_training_days are invariant between splits, even with cv_expanding_window =True:
This means that the test num_training_points >= default_min_points can be applied only once directly from train_end_date before entering the CV loop (the current Fitting 3 folds for each of 1 candidates, totalling 3 fits section apparently tests the seasonality terms at each split, but the test values are invariant, as mentioned above).
The text was updated successfully, but these errors were encountered:
Greykite documentation states that the seasonality "auto" option is meant to let the template decide, based on input data frequency and the amount of training data, whether to model that seasonality with default Fourier order:
https://linkedin.github.io/greykite/docs/0.1.0/html/pages/model_components/0300_seasonality.html?highlight=seasonality
However, with monthly data, this option always defaults to False, both for
QUARTERLY_SEASONALITY
andYEARLY_SEASONALITY
, even when the amount of training data (num_training_days
) is greater than the minimum required (default_min_days
). Why ? Read below.These are the Silverkite default settings for minimum training data requirements, as defined in \greykite\algo\forecast\silverkite\constants\silverkite_seasonality.py
num_training_days
is calculated in \greykite\common\time_properties_forecast.py, whereas the actual test is in \greykite\algo\forecast\silverkite\forecast_simple_silverkite.py(here,num_days
isnum_training_days
calculated above):The result of the test is always False for monthly data, because
freq_auto_seas_names
is an empty dictionary, hence the conditionseas.name in freq_auto_seas_names
is never met ; the reason can be clearly seen in \greykite\algo\forecast\silverkite\constants\silverkite_time_frequency.py, where, e.g., for weekly datafreq_auto_seas_names
is the following dictionary:whereas for monthly, quarterly and yearly data
freq_auto_seas_names = {}
, e.g. for monthly data:Therefore, based on input data frequency in the first line of this issue really means: if the data frequency is one of MINUTE, HOUR, DAY, WEEK, excluding MONTH, QUARTER, YEAR, MULTIYEAR.
The "better" option in \greykite\algo\forecast\silverkite\constants\silverkite_time_frequency.py when using monthly data is thus to add an extra
C(month)
column as a categorical feature indicating the month.Question: Why is this a "better" option than the following definition ?
I see the following alternatives when dealing with monthly data:
C(month)
column as a categorical feature indicating the month; this has the disadvantage that the extra column should only be added when bothQUARTERLY_SEASONALITY
andYEARLY_SEASONALITY
options are set to "auto" and not to "True" or "False" (quarterly and/or yearly seasonality terms are added automatically by Greykite when the respective option is set to "True", according to thevalid_seas
dictionary defined in _\greykite\common\enums.py; while the term in question is not added when "False")QUARTERLY_SEASONALITY
andYEARLY_SEASONALITY
terms (currently excluded from defaults) to the emptyauto_fouries_seas
dictionary; but Greykite developers seem to prefer option 1.valid_seas
andnum_training_points >= default_min_points
One may argue that
num_training_points
varies between training sets when using CV splits; however, the following example shows that bothnum_training_points
andnum_training_days
are invariant between splits, even withcv_expanding_window =True
:This means that the test
num_training_points >= default_min_points
can be applied only once directly fromtrain_end_date
before entering the CV loop (the currentFitting 3 folds for each of 1 candidates, totalling 3 fits
section apparently tests the seasonality terms at each split, but the test values are invariant, as mentioned above).The text was updated successfully, but these errors were encountered: