Skip to content

Commit

Permalink
Merge pull request #84 from databricks-industry-solutions/add-numeric…
Browse files Browse the repository at this point in the history
…al-and-categorical-covariates

added dynamic future numerical and categorical
  • Loading branch information
ryuta-yoshimatsu authored Jan 24, 2025
2 parents 1c0fb9b + f34ff6c commit ab0e195
Show file tree
Hide file tree
Showing 12 changed files with 178 additions and 107 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ run_forecast(
#### Parameters description:

- ```train_data``` is a delta table name that stores the input dataset.
- ```scoring_data``` is a delta table name that stores the [dynamic future regressors](https://nixtlaverse.nixtla.io/neuralforecast/examples/exogenous_variables.html#3-training-with-exogenous-variables). If not provided or if the same name as ```train_data``` is provided, the models will ignore the future dynamical regressors.
- ```scoring_data``` is a delta table name that stores the [dynamic future regressors](https://nixtlaverse.nixtla.io/statsforecast/docs/how-to-guides/exogenous.html). If not provided or if the same name as ```train_data``` is provided, the models will ignore the future dynamical regressors.
- ```scoring_output``` is a delta table where you write your forecasting output. This table will be created if does not exist
- ```evaluation_output``` is a delta table where you write the evaluation results from all backtesting trials from all time series and all models. This table will be created if does not exist.
- ```group_id``` is a column storing the unique id that groups your dataset to each time series.
Expand Down
4 changes: 2 additions & 2 deletions examples/global_external_regressors_daily.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@
# COMMAND ----------

# MAGIC %md
# MAGIC Note that in `rossmann_daily_train` we have our target variable `Sales` but not in `rossmann_daily_test`. This is because `rossmann_daily_test` is going to be used as our `scoring_data` that stores `dynamic_future` variables of the future dates. When you adapt this notebook to your use case, make sure to comply with these datasets formats. See neuralforecast's [documentation](https://nixtlaverse.nixtla.io/neuralforecast/examples/exogenous_variables.html) for more detail on exogenous regressors.
# MAGIC Note that in `rossmann_daily_train` we have our target variable `Sales` but not in `rossmann_daily_test`. This is because `rossmann_daily_test` is going to be used as our `scoring_data` that stores `dynamic_future_categorical` variables of the future dates. When you adapt this notebook to your use case, make sure to comply with these datasets formats. See neuralforecast's [documentation](https://nixtlaverse.nixtla.io/neuralforecast/examples/exogenous_variables.html) for more detail on exogenous regressors.

# COMMAND ----------

Expand Down Expand Up @@ -118,7 +118,7 @@

# MAGIC %md ### Run MMF
# MAGIC
# MAGIC Now, we run the evaluation and forecasting using `run_forecast` function. We are providing the training table and the scoring table names. If `scoring_data` is not provided or if the same name as `train_data` is provided, the models will ignore the `dynamic_future` regressors. Note that we are providing a covariate field (i.e. `dynamic_future`) this time in `run_forecast` function called in [examples/run_external_regressors_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/run_external_regressors_daily.py). There are also other convariate fields, namely `static_features`, and `dynamic_historical`, which you can provide. Read more about these covariates in [neuralforecast's documentation](https://nixtlaverse.nixtla.io/neuralforecast/examples/exogenous_variables.html).
# MAGIC Now, we run the evaluation and forecasting using `run_forecast` function. We are providing the training table and the scoring table names. If `scoring_data` is not provided or if the same name as `train_data` is provided, the models will ignore the `dynamic_future_numerical` and `dynamic_future_categorical` regressors. Note that we are providing a covariate field (i.e. `dynamic_future_numerical` or `dynamic_future_categorical`) this time in `run_forecast` function called in [examples/run_external_regressors_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/run_external_regressors_daily.py). There are also other convariate fields, namely `static_features`, and `dynamic_historical_numerical` and `dynamic_historical_categorical`, which you can provide. Read more about these covariates in [neuralforecast's documentation](https://nixtlaverse.nixtla.io/neuralforecast/examples/exogenous_variables.html).

# COMMAND ----------

Expand Down
6 changes: 3 additions & 3 deletions examples/local_univariate_external_regressors_daily.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@
# COMMAND ----------

# MAGIC %md
# MAGIC Note that in `rossmann_daily_train` we have our target variable `Sales` but not in `rossmann_daily_test`. This is because `rossmann_daily_test` is going to be used as our `scoring_data` that stores `dynamic_future` variables of the future dates. When you adapt this notebook to your use case, make sure to comply with these datasets formats. See statsforecast's [documentation](https://nixtlaverse.nixtla.io/statsforecast/docs/how-to-guides/exogenous.html) for more detail on exogenous regressors.
# MAGIC Note that in `rossmann_daily_train` we have our target variable `Sales` but not in `rossmann_daily_test`. This is because `rossmann_daily_test` is going to be used as our `scoring_data` that stores `dynamic_future_categorical` variables of the future dates. When you adapt this notebook to your use case, make sure to comply with these datasets formats. See statsforecast's [documentation](https://nixtlaverse.nixtla.io/statsforecast/docs/how-to-guides/exogenous.html) for more detail on exogenous regressors.

# COMMAND ----------

Expand Down Expand Up @@ -134,7 +134,7 @@

# MAGIC %md ### Run MMF
# MAGIC
# MAGIC Now, we run the evaluation and forecasting using `run_forecast` function. We are providing the training table and the scoring table names. If `scoring_data` is not provided or if the same name as `train_data` is provided, the models will ignore the `dynamic_future` regressors. Note that we are providing a covariate field (i.e. `dynamic_future`) this time. There are also other convariate fields, namely `static_features`, and `dynamic_historical`, but these are only relevant with the global models.
# MAGIC Now, we run the evaluation and forecasting using `run_forecast` function. We are providing the training table and the scoring table names. If `scoring_data` is not provided or if the same name as `train_data` is provided, the models will ignore the `dynamic_future_numerical` and `dynamic_future_categorical` regressors. Note that we are providing a covariate field (i.e. `dynamic_future_numerical` or `dynamic_future_categorical`) this time. There are also other convariate fields, namely `static_features`, `dynamic_historical_numerical` and `dynamic_historical_categorical`, but these are only relevant with the global models.

# COMMAND ----------

Expand All @@ -148,7 +148,7 @@
date_col="Date",
target="Sales",
freq="D",
dynamic_future=["DayOfWeek", "Open", "Promo", "SchoolHoliday"],
dynamic_future_categorical=["DayOfWeek", "Open", "Promo", "SchoolHoliday"],
prediction_length=10,
backtest_months=1,
stride=10,
Expand Down
2 changes: 1 addition & 1 deletion examples/run_external_regressors_daily.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
date_col="Date",
target="Sales",
freq="D",
dynamic_future=["DayOfWeek", "Open", "Promo", "SchoolHoliday"],
dynamic_future_categorical=["DayOfWeek", "Open", "Promo", "SchoolHoliday"],
prediction_length=10,
backtest_months=1,
stride=10,
Expand Down
24 changes: 16 additions & 8 deletions mmf_sa/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,10 @@ def run_forecast(
model_output: str = None,
use_case_name: str = None,
static_features: List[str] = None,
dynamic_future: List[str] = None,
dynamic_historical: List[str] = None,
dynamic_future_numerical: List[str] = None,
dynamic_future_categorical: List[str] = None,
dynamic_historical_numerical: List[str] = None,
dynamic_historical_categorical: List[str] = None,
active_models: List[str] = None,
accelerator: str = "cpu",
backtest_retrain: bool = None,
Expand Down Expand Up @@ -63,8 +65,10 @@ def run_forecast(
model_output (str): A string specifying the output path for the model.
use_case_name (str): A string specifying the use case name.
static_features (List[str]): A list of strings specifying the static features.
dynamic_future (List[str]): A list of strings specifying the dynamic future features.
dynamic_historical (List[str]): A list of strings specifying the dynamic historical features.
dynamic_future_numerical (List[str]): A list of strings specifying the dynamic future features that are numerical.
dynamic_future_categorical (List[str]): A list of strings specifying the dynamic future features that are categorical.
dynamic_historical_numerical (List[str]): A list of strings specifying the dynamic historical features that are numerical.
dynamic_historical_categorical (List[str]): A list of strings specifying the dynamic historical features that are categorical.
active_models (List[str]): A list of strings specifying the active models.
accelerator (str): A string specifying the accelerator to use: cpu or gpu. Default is cpu.
backtest_retrain (bool): A boolean specifying whether to retrain the model during backtesting. Currently, not supported.
Expand Down Expand Up @@ -137,10 +141,14 @@ def run_forecast(
_conf["data_quality_check"] = data_quality_check
if static_features is not None:
_conf["static_features"] = static_features
if dynamic_future is not None:
_conf["dynamic_future"] = dynamic_future
if dynamic_historical is not None:
_conf["dynamic_historical"] = dynamic_historical
if dynamic_future_numerical is not None:
_conf["dynamic_future_numerical"] = dynamic_future_numerical
if dynamic_future_categorical is not None:
_conf["dynamic_future_categorical"] = dynamic_future_categorical
if dynamic_historical_numerical is not None:
_conf["dynamic_historical_numerical"] = dynamic_historical_numerical
if dynamic_historical_categorical is not None:
_conf["dynamic_historical_categorical"] = dynamic_historical_categorical
if run_id is not None:
_conf["run_id"] = run_id

Expand Down
34 changes: 23 additions & 11 deletions mmf_sa/data_quality_checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,10 @@ def _external_regressors_check(self):
"""
if (
self.conf.get("static_features", None)
or self.conf.get("dynamic_future", None)
or self.conf.get("dynamic_historical", None)
or self.conf.get("dynamic_future_numerical", None)
or self.conf.get("dynamic_future_categorical", None)
or self.conf.get("dynamic_historical_numerical", None)
or self.conf.get("dynamic_historical_categorical", None)
):
if self.conf.get("resample"):
raise Exception(
Expand Down Expand Up @@ -77,19 +79,29 @@ def _multiple_checks(

# 1. Checking for nulls in external regressors
static_features = conf.get("static_features", None)
dynamic_future = conf.get("dynamic_future", None)
dynamic_historical = conf.get("dynamic_historical", None)
dynamic_future_numerical = conf.get("dynamic_future_numerical", None)
dynamic_future_categorical = conf.get("dynamic_future_categorical", None)
dynamic_historical_numerical = conf.get("dynamic_historical_numerical", None)
dynamic_historical_categorical = conf.get("dynamic_historical_categorical", None)
if static_features:
if _df[static_features].isnull().values.any():
# Removing: null in static categoricals
# Removing: null in static categorical
return pd.DataFrame()
if dynamic_future:
if _df[dynamic_future].isnull().values.any():
# Removing: null in dynamic future
if dynamic_future_numerical:
if _df[dynamic_future_numerical].isnull().values.any():
# Removing: null in dynamic future numerical
return pd.DataFrame()
if dynamic_historical:
if _df[dynamic_historical].isnull().values.any():
# Removing: null in dynamic historical
if dynamic_future_categorical:
if _df[dynamic_future_categorical].isnull().values.any():
# Removing: null in dynamic future categorical
return pd.DataFrame()
if dynamic_historical_numerical:
if _df[dynamic_historical_numerical].isnull().values.any():
# Removing: null in dynamic historical numerical
return pd.DataFrame()
if dynamic_historical_categorical:
if _df[dynamic_historical_categorical].isnull().values.any():
# Removing: null in dynamic historical categorical
return pd.DataFrame()

# 2. Checking for training period length
Expand Down
8 changes: 6 additions & 2 deletions mmf_sa/forecasting_conf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,16 @@ accelerator: cpu
static_features:
#- State

dynamic_future:
dynamic_future_numerical:

dynamic_future_categorical:
#- Open
#- Promo
#- DayOfWeek

dynamic_historical:
dynamic_historical_numerical:

dynamic_historical_categorical:

active_models:
- StatsForecastBaselineWindowAverage
Expand Down
6 changes: 4 additions & 2 deletions mmf_sa/models/models_conf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,10 @@ promoted_props:
- backtest_months
- stride
- static_features
- dynamic_future
- dynamic_historical
- dynamic_future_numerical
- dynamic_future_categorical
- dynamic_historical_numerical
- dynamic_historical_categorical

models:

Expand Down
Loading

0 comments on commit ab0e195

Please sign in to comment.