Probabilistic Timeseries Forecasting Challenge

Repository for the Probabilistic Timeseries Forecasting Challenge. This challenge focuses on quantile forecasting for timeseries data in Germany and Karlsruhe. Results can be found here and are visualized here.

Forecasts are inherently uncertain, and it is important to quantify this uncertainty. The goal is to predict some kind of distribution of future values, rather than just a single point estimate. Quantiles are a relatively straightforward way to quantify such an uncertainty. The challenge is based on two datasets: bikes and energy. A third dataset no2 is also available, but was not selected for forecast submissions.

DVC tracked parameters, as well as metrics and plots can be found on DVC Studio.

Quickstart

First, clone the repository:

git clone https://github.com/MoritzM00/proba-forecasting
cd proba-forecasting

Then follow the instructions here to set up a dev environment.

Finally, reproduce the results by running:

dvc repro

To delete the cache files (*.sqlite), that are created by the pipeline, run:

make submit

This will delete the files that store the API calls to openMeteo and the data itself, thus updating the data and then force a pipeline reproduction, e.g. to make a submission in a new forecasting week.

Data Pipeline

The data pipeline is fully automated using DVC's data and experiment versioning, as well as caching and remote storage capabilities. The pipeline can be visualized using dvc dag, or via the web using the project's DagsHub location:

flowchart TD
        node1["eval@bikes"]
        node2["eval@energy"]
        node3["prepare@bikes"]
        node4["prepare@energy"]
        node5["submit"]
        node6["train@bikes"]
        node7["train@energy"]
        node3-->node1
        node3-->node6
        node4-->node2
        node4-->node7
        node6-->node1
        node6-->node5
        node7-->node2
        node7-->node5

The pipeline consists of four stages:

prepare: Downloads and preprocesses the data.
train: Train and save the models.
eval: Evaluate the models using Timeseries Cross-validation with expanding time windows.
submit: Create out-of-sample forecasts in the required format for this forecasting challenge.

Stages 1-3 are run for two datasets: bikes and energy.

The image below shows an example run of the pipeline, showcasing the automatic caching capabilities of DVC. Therefore, only the stages that have changed since the last run are executed.

Repository Structure

The following is the structure of the repository:

proba-forecasting
├── data # DVC managed data directory
├── models # stores the trained models in .pkl, dvc managed
├── notebooks # used for development
├── output # dvc managed output directory, contains all artifacts created by the pipeline
├── scripts
├── src
│   └── probafcst
│       ├── metrics
│       ├── models # defines all models using sktime interface
│       ├── pipeline # defines the data pipeline scripts used by dvc.yaml
│       ├── utils # utils such as tabularization, checking etc.
│       ├── __init__.py
│       ├── backtest.py # implements TSCV with sktime
│       ├── data.py # data fetching
│       ├── plotting.py
│       └── weather.py
├── tests
├── LICENSE
├── Makefile
├── README.md
├── dvc.lock
├── dvc.yaml # Pipeline definition in yaml
├── params.yaml # defines parameters
├── pyproject.toml
└── uv.lock

Development Guide

This guide shows how to reproduce the results of the challenge.

Set up the environment

Install uv
Set up the environment:

make setup
source .venv/bin/activate

Reproduce the results

After setting up and activating the environment, run:

dvc repro

to reproduce the results.

Documentation

The Documentation is automatically deployed to GitHub Pages.

To view the documentation locally, run:

make docs_view

Credits

This project was generated with the Light-weight Python Template by Moritz Mistol.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Probabilistic Timeseries Forecasting Challenge

Quickstart

Data Pipeline

Repository Structure

Development Guide

Set up the environment

Reproduce the results

Documentation

Credits

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 240 Commits
.dvc		.dvc
.github		.github
data		data
images		images
models		models
notebooks		notebooks
output		output
scripts		scripts
src/probafcst		src/probafcst
tests		tests
.dvcignore		.dvcignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

MoritzM00/proba-forecasting

Folders and files

Latest commit

History

Repository files navigation

Probabilistic Timeseries Forecasting Challenge

Quickstart

Data Pipeline

Repository Structure

Development Guide

Set up the environment

Reproduce the results

Documentation

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages