Skip to content

Commit

Permalink
Add change point detection module. (#41)
Browse files Browse the repository at this point in the history
* Implement various Bayesian conjugate priors.

* Allow returning updated posterior after inference.

* Allow creating TimeSeries from numpy directly.

* Add testing for conjugate priors.

* Let conj priors update on a single ts value.

* Update docstring for SpectralResidual.

* Initial implementation of BOCPD.

* Update uninformative priors.

The new uninformative priors allow estimation of prior probabilities
without conditioning on any data.

* Have BOCPD return z-score units.

* Smarter prior initialization based on data.

* Add BOCPD to API docs.

* Use future look-ahead to aggregate probabilities.

We consider a maximal look-ahead equal to the lag, and we allow the
model to use data up to `lag` steps in the future to decide whether each
point is a change point. This can greatly improve the batch prediction.

* Update default BOCPD lag to None.

* Automatic selection of Bayesian conjugate prior.

* Remove unnecessary copying line.

* Allow singular covariances in priors.

* Fix an offset error in BOCPD dynamic programming.

* Add explicit posterior for BayesianMVLinReg.

* Use explicit posterior in BOCPD where possible.

* Make sparse matrix allocation more efficient.

* Use fully uninformative priors.

Setting priors in a data-driven way led to over-estimating the
probability of some change points.

* Make sure matrices are non-singular.

* Slightly refine min log likelihood calculation.

* Add test coverage for BOCPD.

* Make sure matrix is PSD as well as non-singular.

* Allow BOCPD to predict on historical data.

* Allow time series alignment for empty time series.

* Make sure last_train_time is set in BOCPD.update

* Build a predictive model for BOCPD.

* Allow conjugate priors to make forecasts.

* Add forecasting ability to BOCPD.

* Train BOCPD on transformed time series.

* Make format of model/config docs more consistent.

* Add tests for BOCPD visualizations.

* Update version.

* Fix failing BOCPD tests.

* Backwards compatibility for scipy 1.5.

Scipy 1.6.0 introduced the multivariate_t random variable, which we use
in our implementation of conjugate priors. However, scipy 1.6.0+
requires Python 3.7+. To maintain backwards compatibility with Python
3.6 (and therefore scipy 1.5), we implement the log density of the
multivariate t distribution & use it as a fallback where necessary.

We also implement an optimized computaiton of pseudo-inverse, and
explicitly allow for singular V_0 matrices in the computation of the
Bayesian Multivariate Linear Regression posterior.

Finally, we update the testing workflows to avoid segfaults due to BLAS
bugs in older package versions.

* Allow integer # of time_stapms for BOCPD.

* Remove issubclass check.

* Add mentions of change point detection in the docs
  • Loading branch information
aadyotb authored Nov 8, 2021
1 parent 18fe266 commit 5d6f891
Show file tree
Hide file tree
Showing 44 changed files with 1,894 additions and 54 deletions.
14 changes: 13 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,17 @@ jobs:
- name: Test with pytest
id: test
run: |
coverage run --source=merlion/ -L -m pytest -v
# A BLAS bug causes high-dim multivar Bayesian LR test to segfault in 3.6. Run the test first to avoid.
if [[ $PYTHON_VERSION == 3.6 ]]; then
python -m pytest -v tests/change_point/test_conj_prior.py
coverage run --source=merlion/ -L -m pytest -v --ignore tests/change_point/test_conj_prior.py
# MoE test seems to hang in 3.7. Run the test first to avoid.
elif [[ $PYTHON_VERSION == 3.7 ]]; then
python -m pytest -v tests/forecast/test_MoE_forecast_ensemble.py
coverage run --source=merlion/ -L -m pytest -v --ignore tests/forecast/test_MoE_forecast_ensemble.py
else
coverage run --source=merlion/ -L -m pytest -v
fi
# Obtain code coverage from coverage report
coverage report
Expand All @@ -56,6 +66,8 @@ jobs:
COLOR=red
fi
echo "##[set-output name=color;]${COLOR}"
env:
PYTHON_VERSION: ${{ matrix.python-version }}

- name: Create coverage badge
if: ${{ github.ref == 'refs/heads/main' && matrix.python-version == '3.8' }}
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,10 @@
## Introduction
Merlion is a Python library for time series intelligence. It provides an end-to-end machine learning framework that
includes loading and transforming data, building and training models, post-processing model outputs, and evaluating
model performance. It supports various time series learning tasks, including forecasting and anomaly detection for both
univariate and multivariate time series. This library aims to provide engineers and researchers a one-stop solution to
rapidly develop models for their specific time series needs, and benchmark them across multiple time series datasets.
model performance. It supports various time series learning tasks, including forecasting, anomaly detection,
and change point detection for both univariate and multivariate time series. This library aims to provide engineers and
researchers a one-stop solution to rapidly develop models for their specific time series needs, and benchmark them
across multiple time series datasets.

Merlion's key features are
- Standardized and easily extensible data loading & benchmarking for a wide range of forecasting and anomaly
Expand Down
4 changes: 2 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
Welcome to Merlion's documentation!
===================================
Merlion is a Python library for time series intelligence. It features a unified interface for many commonly used
:doc:`models <merlion.models>` and :doc:`datasets <ts_datasets>` for anomaly detection and forecasting
on both univariate and multivariate time series, along with standard
:doc:`models <merlion.models>` and :doc:`datasets <ts_datasets>` for forecasting, anomaly detection, and change
point detection on both univariate and multivariate time series, along with standard
:doc:`pre-processing <merlion.transform>` and :doc:`post-processing <merlion.post_process>` layers.
It has several modules to improve ease-of-use,
including :ref:`visualization <merlion.plot>`,
Expand Down
21 changes: 21 additions & 0 deletions docs/source/merlion.models.anomaly.change_point.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
merlion.models.anomaly.change\_point package
============================================

.. automodule:: merlion.models.anomaly.change_point
:members:
:undoc-members:
:show-inheritance:

.. autosummary::
bocpd

Submodules
----------

merlion.models.anomaly.change\_point.bocpd module
-------------------------------------------------

.. automodule:: merlion.models.anomaly.change_point.bocpd
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions docs/source/merlion.models.anomaly.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Subpackages
:maxdepth: 4

merlion.models.anomaly.forecast_based
merlion.models.anomaly.change_point

Submodules
----------
Expand Down
2 changes: 2 additions & 0 deletions docs/source/merlion.models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ Finally, we support ensembles of models in :py:mod:`merlion.models.ensemble`.
factory
defaults
anomaly
anomaly.change_point
anomaly.forecast_based
forecast
ensemble
Expand All @@ -75,6 +76,7 @@ Subpackages
:maxdepth: 2

merlion.models.anomaly
merlion.models.anomaly.change_point
merlion.models.anomaly.forecast_based
merlion.models.forecast
merlion.models.ensemble
Expand Down
1 change: 1 addition & 0 deletions docs/source/merlion.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ each associated with its own sub-package:
for anomaly detection and forecasting. More specifically, we have

- :py:mod:`merlion.models.anomaly`: Anomaly detection models
- :py:mod:`merlion.models.anomaly.change_point`: Change point detection models
- :py:mod:`merlion.models.forecast`: Forecasting models
- :py:mod:`merlion.models.anomaly.forecast_based`: Forecasting models adapted for anomaly detection. Anomaly
scores are based on the residual between the predicted and true value at each timestamp.
Expand Down
7 changes: 7 additions & 0 deletions docs/source/merlion.utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,13 @@ utilities for resampling time series.
Submodules
----------

merlion.utils.conj_priors module
--------------------------------
.. automodule:: merlion.utils.conj_priors
:members:
:undoc-members:
:show-inheritance:

merlion.utils.istat module
--------------------------

Expand Down
3 changes: 2 additions & 1 deletion merlion/models/anomaly/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
#
"""
Contains all anomaly detection models. Forecaster-based anomaly detection models
may be found in :py:mod:`merlion.models.anomaly.forecast_based`.
may be found in :py:mod:`merlion.models.anomaly.forecast_based`. Change-point detection models may be
found in :py:mod:`merlion.models.anomaly.change_point`.
For anomaly detection, we define an abstract `DetectorBase` class which inherits from `ModelBase` and supports the
following interface, in addition to ``model.save`` and ``DetectorClass.load`` defined for `ModelBase`:
Expand Down
21 changes: 20 additions & 1 deletion merlion/models/anomaly/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,18 @@ class NoCalibrationDetectorConfig(DetectorConfig):
def __init__(self, enable_calibrator=False, **kwargs):
super().__init__(enable_calibrator=enable_calibrator, **kwargs)

@property
def calibrator(self):
"""
:return: ``None``
"""
return None

@calibrator.setter
def calibrator(self, calibrator):
# no-op
pass

@property
def enable_calibrator(self):
"""
Expand Down Expand Up @@ -132,7 +144,14 @@ def _default_post_rule_train_config(self):
from merlion.evaluate.anomaly import TSADMetric

t = self.config._default_threshold.alm_threshold
q = None if self.config.enable_calibrator or t == 0 else 2 * norm.cdf(t) - 1
# self.calibrator is only None if calibration has been manually disabled
# and the anomaly scores are expected to be calibrated by get_anomaly_score(). If
# self.config.enable_calibrator, the model will return a calibrated score.
if self.calibrator is None or self.config.enable_calibrator or t == 0:
q = None
# otherwise, choose the quantile corresponding to the given threshold
else:
q = 2 * norm.cdf(t) - 1
return dict(metric=TSADMetric.F1, unsup_quantile=q)

@property
Expand Down
10 changes: 10 additions & 0 deletions merlion/models/anomaly/change_point/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#
# Copyright (c) 2021 salesforce.com, inc.
# All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
# For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause
#
"""
Contains all change point detection algorithms. These models implement the anomaly detector interface, but
they are specialized for detecting change points in time series.
"""
Loading

0 comments on commit 5d6f891

Please sign in to comment.