Skip to content

Commit

Permalink
v0.2.0 - merge dev into main (#126)
Browse files Browse the repository at this point in the history
* Bring main and dev in sync for new GH workflow

* Sync main and dev

* Enable linting in pipeline (#34)

configure and run flake8

Co-authored-by: Thomas Schill

* 32 enhance sphinx bibtext (#35)

* fix typo

* add refs.bib, update citations

Co-authored-by: Thomas Schill 

* Update README.rst

* Update setup.cfg

* Update setup.cfg

* Update setup.cfg

* Remove wheel dependency.

Including it didn't speed up the pipeline or squash the complaint from sklearn.

* kdqTree: Added compatibility with pandas dataframes (#40)

Transferred changes for issue #133 on kdqtree pandas from GL to GH

Co-authored-by: Shashank Jarmale

* Merge unit tests for example materials (#43)

* add simple copy of main yml towards #8

* add environment variable for script tester

* add script tester towards #8

* limit job to example scripts

* use abspath for executing from tests/

* split src / example tests, reorganize test directories

* revert to simpler file refs, edit workflow to cd into correct dir

* fix typo in job trigger

* add notebook tester towards #8

* add venv/kernel steps to allow for nbconvert tests

* fix typo towards !43

* clean up lines to close !43

* Add separate coverage test, workflow improvemets (#44)

* move coverage to separate workflow, fail under 100

* revert to 1 combined job for test + cov, fail under 99 (#5)

* remove tmp branch from workflow

* separate lint workflow, add isort

* rework black step, remove isort

* fix black version

* run black, update README towards #44

* add badges to readme

* fix badge section

* Update .github/workflows/tests.yml

* Update README.rst

* Update README.rst

* Update README.rst

* Update README.rst

Co-authored-by: Thomas Schill <[email protected]>

* Add unit tests for kdqTree (#47)

* Adds more unit tests for kdqTree

* add validation unit tests

* Add a unit test for KDQTreePartitioner

* add reset to set_reference

* use ref_data variable properly when drift occurs

* Update hdm docs (#50)

* updated docs

* closes issue 33

* apply black formatting

Co-authored-by: Thomas Schill <[email protected]>

* Update README_dataCircleGSev3Sp3Train.txt (#52)

* Adds notebook versions of the examples to RTD. (#53)

* Started testing out python example notebooks with sphinx

* update conf.py to enable pandoc, move examples around

* Added example notebooks for data drift detectors

* Added example notebooks for modules

* remove extra notebooks, fix plotly plots

Co-authored-by: Shashank Jarmale
Co-authored-by: Shashank Jarmale

* add tox to dev install

* Fix kdq-tree batch example in documentation example notebook (#54)

Duplicating the examples for the purpose of the documentation got us an older version pulled forward that I didn't catch during review of the PR.

* Minor updates to constrain Python version used in installation (#55)

* add tox python version checks towards #2

* Update .gitignore

* fix syntax

* add version notes

* remove older versions from tox

* Update README to include pyenv steps, towards #2

* remove pyenv section

Co-authored-by: Thomas Schill <[email protected]>

* Merge new data module and reorganize data files (#56)

* add (untested) .python that duplicates make_example_data.R

* add TODO items

* reorganize tools => datasets towards #38

* further reorganize datasets module, add DataGenerator idea

* split DataGenerator idea, and fix bugs in make_example_batch_data

* update any example_data.csv script to now use function

* consolidate dataset descriptions into one README

* debug make_example_data

* delete outdated data files towards #38

* remove TODO comment

* satisfy formatting requirements

* add unit tests and comment out untested code

* comment out missing code, add single-line description towards #38

* minor formatting changes to trigger checks

* debug unit tests, re-satisfy formatting requirements

* update references in docs notebooks, add generator docstring
 also fixes some whitespace in cdbd.py

Co-authored-by: Thomas Schill <[email protected]>

* Merge new streaming, batch ABCs and refactor KdqTree detector (#62)

* separate into streaming and batch detector ABCs (#46)

* split kdqtree into streaming/batch versions, update tests

* finish batch version of kdqtree

* begin using multiple inheritance scheme for kdqtree detectors (#46)

* establish commonly inherited functionality in new KdqTreeDetector class

* establish commonly inherited functionality in new KdqTree detector class (#46)

* deconstruct update to enable code reuse in KdqTreeDetector (#46)

* debug all failing tests in test_kdqtree (#46)

* update __init__, update refs in examples (#46)

* update outdated data refs

* add any missing docstrings (#46)

* format with black

* add unit test for new ABC drift setters

* updated the data_drift_examples notebook

* docstring formatting tweaks

* fix typo

Co-authored-by: Thomas Schill <[email protected]>

* fix formatting in docstring

Co-authored-by: Thomas Schill <[email protected]>

* fix formatting in docstring

Co-authored-by: Thomas Schill <[email protected]>

* fix typo

Co-authored-by: Thomas Schill <[email protected]>

* fix description in docstring

Co-authored-by: Thomas Schill <[email protected]>

* formatting

* remove double-documented attributes from docstring

* provide useful information in child docstrings

* move _drift_counter into KdqTreeStreaming

* delete coverage file

* toss ref data once processed

* format with black

Co-authored-by: Thomas Schill <[email protected]>

* Switch to README.md for better rendering on github (#49)

* switch to README.md for better rendering on github
 - removes reference links from table
 - adds placeholder mermaid flow diagram
 - makes some tweaks to the README text

* update requirements in setup.cfg

* test mermaid rendering

* Add "Choosing a Detector" page to TOC

* tweak README text

* add RTD hyperlink

* Merge with current dev

* remove draft flow diagram

* Add CHANGELOG and pypi actions for release.  (#51)

* add CHANGELOG, yaml

* add Action to push to pypi upon published release

* change name of workflow

* test adding security linter

* test bandit linting

* comments

* alphabetize setup.cfg.test

* increment version number

* change lint badge name

* address comments for main-dev PR 63

* Add nbconvert script to make the example files into .py files (#65)

* add a function to fetch the circle data from whatever working dir

* add a script to generate .py files from .ipynb docs/source/examples

* add coverage test

* add unit tests

* update README

* move find_git_root to utils/_locate, tweak formatting

* Add template for benchmarks directory (#69)

* setup benchmarks materials (#58)

* add details to benchmarking materials (#58)

* Implemented Margin Density Drift Detection (MD3) Method (#60)

* Created md3 class and started to build out basic detector methods

* More development on MD3 implementation

Added update and marginal inclusion signal calculation capabilities

* MD3 Implementation Progress

Added ability to issue drift warning based on change in margin density
relative to that of reference distribution.
Next step is to implement system to collect labeled samples to confirm
that drift is occurring (or rule out drift).
Another next step is to work on building out example(s) of MD3, with SVM
specifically for now.

* Wrapped some calculation lines for clarity

* Finished preliminary MD3 implementation

Next step is to complete a full working example for MD3 using an SVM.
Step after that is to address TODOs in implementation (add compatibility
with other types of models, make all user-facing methods intuitive and
clear, etc.).

* Completed MD3 implementation with oracle labels and retraining

Example finished for the most part, probably some debugging to do

* Got MD3 example working

But currently is not actually detecting drift. Is tracking margin
density over the stream correctly, but no drift warnings/detections are
being issued. Need to play around with sensitivity threshold for
suspecting/confirming drift to see if that's the issue.

* MD3 Implementation working + example working

* MD3 implementation and example working

Still some debugging and cleanup to do. Also have to play around with
sensitivity parameter to see what a good default value would be.

* Continued updating MD3 implementation and example

Few design questions to answer regarding when MD3 warns/resets internal
paremeters based on drift confirmation

* Moved retraining data green lines to be in the right place

start right after warning

* Removed some TODOs

* Reverted concept drift example back to original version

* Resolved some TODOs from PR draft

* Changed dataset for MD3 example to new rainfall dataset from India

* Added unit tests for MD3, added MD3 to README

* Will have to reorganize MD3 example before merging PR

* Reformatted md3.py with black

* Increased test coverage, reformatted md3.py

* Finalized md3 unit tests

* Finished some TODOs in MD3 implementation

* Added tests for fetch_rainfall_data and formatted make_example_data.py

* This commit contains:

the completed md3_example.py script, and the example has been added to
the concept_drift_examples.ipynb example notebook. md3_example.py will
be deleted in the next commit.

* Deleted md3_example.py (in concept drift example notebook)

* Regenerated example scripts from example notebooks

* Removed TODO from md3.py

* Update citation

Co-authored-by: Shashank Jarmale
Co-authored-by: Thomas Schill <[email protected]>

* 68 remove changelog workflow (#71)

* drop the changelog yaml

* tweaks to docstrings

* update documentation for make_example_batch_data

* add docstring for fetch_circle_data

* update refs.bib

* Add citation and description to rainfall data (#73)

* tweaks to md3, kdq_tree docstrings

* remove "example" from notebook subheadings

* remove old datasets README

* closes issue #72

* and-delimited authors in citation; fix in-line cite

* fix some sphinx build errors

* add returns to docstrings

Co-authored-by: indialindsay <[email protected]>

* fix a typo preventing doc build

* Refactor inputs to update and set_reference functions (#75)

* refactor ABCs to have common signature in update (#15)

* refactor change detectors to have common signature in update (#15)

* refactor data drift detectors to have common signature in update (#15)

* remove obs_id from PageHinkley

* refactor concept drift detectors to have common update signature (#15)

* refactor batch detectors to have common set_reference signature (#15)

* update examples, README, ADWIN with function signature changes (#15)

* modify adwin unit tests for new update sig

* add int cast to ADWIN input check

* fix formatting with black

* fix outdated udpate function in stepd

* debug MD3, modify docstring for find_git_root

* improve formatting and test coverage

* update example notebooks, regenerate scripts

* make sure README example is functional

* resolve minor issue in PCACD

Co-authored-by: Thomas Schill <[email protected]>

* Reorganize tests directory to mirror src/ structure (#79)

reorganize tests directory (#78)

* Make concept_drift.ADWIN a child of change_detection.ADWIN (#83)

* make concept_drift.ADWIN a child of change_detection.ADWIN

* increment version, update eddm docstring

* rename new ADWIN, add unit test

* update README ADWIN example

* remove input_type

* add redundant docstrings, unit tests for sphinx

* fix import statement

* move convert_notebooks.py to utils

* move lfr.round_val to init

* add change_detection.ADWIN example

* update nb description

* Revert "move convert_notebooks.py to utils"

This reverts commit fe05b78.

* add original accuracy calculation to ADWIN example

* regenerate .py examples

* update rainfall unit test

* tweaks to examples

* rename new class

* 84 remove redundant docstrings (#85)

* make garbage input for unit test more garbagey

* add :inherited-members: option to docs

* update README

* remove redundant unit tests for cdbd, hdddm

* remove drift_state from child signatures

* remove drift_state from adwin_outcome

* make the other class attribs into properties for consistency's sake

* more docstring cleanup

* added groupwise member order to conf.py

* Add accuracy calculations to the example notebook plots for concept drift detectors (#88)

* add running accuracy to concept drift examples

* Draft: Add Ensemble for Batch Detectors (#90)

* begin sketching BatchEnsembler (#77)

* sketch Ensembler, BatchEnsembler using toolz.pipe

* add more scratch work re: ensemblers and pipelines (#77)

* significantly simplify batch ensembling, add data robustness (#77)

* finish getting ensembler to execute set_ref/update for all batch dets (#77)

* add simple majority evaluator, make ensembler fully functional (#77)

* cleanup for PR !90, begin adding tests

* add more unit tests for batch ensemble

* add docs for new ensemble code

* formatting with black/bandit

* Replace notebook examples workflow (#92)

* update workflow yml

* update docstrings for #91

* Add validation to StreamingDetector (#95)

* add y-validation

* add X-validation

* kdq_tree tweaks

* switch concept drift detectors to StreamingDetector

* switch change detectors to StreamingDetector

* switch PCA-CD to StreamingDetector

* 96 batch validation (#97)

* add batch validation

* remove redundant validation

* switch to deepcopy for set_reference

* Add streaming ensemble (#101)

* add initial draft of StreamingEnsemble, move set_reference into BatchEnsemble

* make stream ensemble run for every data/concept detector except MD3 (#89)

* add minimim-approval and confirmed-approval evaluators (#89)

* temporary fix for univariate detectors (#89)

* add unit tests, begin debugging KdqTree in batch case (#89)

* debug kdqtree batch ensemble unit test

* fix and test confirmed approval evaluator

* evaluators are now just functions

* evaluators can now be str or function

* Revert "evaluators can now be str or function"

This reverts commit d4d45de.

Co-authored-by: Thomas Schill <[email protected]>

* 99 ensemble quickstart (#102)

* sketch new readme examples (#99)

* less wordy version of quickstart steps (#99)

* reduce lines

* further cleanup

* spacing?

* wording changes

* asserts instead of print (downside: example exits with error)

* Update README.md

Co-authored-by: Thomas Schill <[email protected]>

Co-authored-by: Thomas Schill <[email protected]>

* changed references for phtest and cusum (#108)

* Reshape input as part of validation. (#107)

* add validation to concept drift, change detectors, data drift detectors
* new unit tests
* update docstrings

* Change column specification to column selectors (#109)

* initial change to column selectors (#104)

* intermediate push before merging

* debug input issues

* security check - remove assert

* Rename ADWIN concept drift detector (#111)

* rename ADWINOutcome to ADWINAccuracy

* update coverage badge text

* docstring tweaks

* Add Maciel Election, refactor other elections (#113)

* initial switch to evaluator class (#100)

* debug new evaluators

* emergency commit for viewing

* add maciel election; change evaluator to election (#100)

* det list now passed by ensemble to election call (#100)

* add passing tests (#100)

* formatting, coverage fixes

* edit list comprehension for conciseness

Co-authored-by: Thomas Schill <[email protected]>

* changes to satify PR comments

* remove question comments

* add citations

* add a bullet on ensembles to README

* rename tests

Co-authored-by: Thomas Schill <[email protected]>

* rename tests

Co-authored-by: Thomas Schill <[email protected]>

* rename maciel tests

Co-authored-by: Thomas Schill <[email protected]>

* rename maciel tests

Co-authored-by: Thomas Schill <[email protected]>

* update ensemble table portion

Co-authored-by: Thomas Schill <[email protected]>

* update ensemble table portion

Co-authored-by: Thomas Schill <[email protected]>

* rename maciel tests

Co-authored-by: Thomas Schill <[email protected]>

* rename maciel tests

Co-authored-by: Thomas Schill <[email protected]>

* rename min approval tests

Co-authored-by: Thomas Schill <[email protected]>

* rename min approval tests

Co-authored-by: Thomas Schill <[email protected]>

Co-authored-by: Thomas Schill <[email protected]>

* Update README.md

* Finish ensemble documentation (#114)

* cannot test, but minor adjustments to ensemble docs, setup files

* add ensemble examples

* docstring fixes, add notebook to index.rst

* format with black

* better example in notebook

* fix missing booktitle in Maciel

* tweaked some narration in the example

* add comment for sphinx freeze

Co-authored-by: Thomas Schill <[email protected]>

* shorten wording in docstring

Co-authored-by: Thomas Schill <[email protected]>

* formatting for sphinx

Co-authored-by: Thomas Schill <[email protected]>

* first pass at addressing remaining PR comments

* final formatting changes to docstrings

* directory change

* use better file-finding method in make_example_data

* Update src/menelaus/ensemble/ensemble.py

* Update src/menelaus/ensemble/ensemble.py

Co-authored-by: Thomas Schill <[email protected]>

* 105 rename src (#116)

* move /menelaus/ up a directory, remove /src/

* rename drift_detector.py to detector.py

* remove erroneous pytest call from make_example_data

* add references page to index

* move deprecation note to DriftDetector docstring

* README tweak

* 118 dummy out validation (#119)

* pass None instead of X where appropriate

* dummy out unused args for change detector validation

* dummy out validation in drift detectors

* black formatting

* tweak narration in ensemble notebook

* fix typo, comments

* Add fixed NNDVI  (#124)

* Create NNSpacePartitioner.py

* Add NNSP to partitioner __init__.py

* Create nndvi.py to set up debugging in #36

* add nndvi example to debug #36

* potential fix for build and dissimilarity

* change drift threshold computation
- the current implementation uses a random permutation for both new sets - which means they are not mutually exclusive. This means the numerator in compute_nnps_distance can contain 0's.
- making the second set the reverse of the first fixes this.

* change NNDVI to BatchDetector

* add nndvi unit tests

* add unit tests for NNSP

* formatting with black

* add sklearn to cfg

* remove assert in nndvi for security

* formatting with black

* add a few comments

* modify scikit-learn import for pipeline

* add nndvi tests for better coverage

* remove warning statement in outdated if clause

* == to = fixes coverage bug

* formatting with black

* validation / other PR changes

* fold nndvi example into notebook

* replace length calculation, comments

* tweaks to example notebook

* add comments to NN-DVI example

Co-authored-by: Thomas Schill <[email protected]>

* update CHANGELOG

* remove benchmarks folder

Co-authored-by: Shashank Jarmale 
Co-authored-by: Shashank Jarmale
Co-authored-by: Anmol Srivastava
Co-authored-by: India Lindsay <[email protected]>
  • Loading branch information
tms-bananaquit and indialindsay committed Dec 8, 2022
1 parent 8bdac6d commit 63d14ee
Show file tree
Hide file tree
Showing 101 changed files with 25,289 additions and 4,113 deletions.
11 changes: 6 additions & 5 deletions .github/workflows/examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,14 @@ jobs:
python-version: "3.10"
- name: Install dependencies
run: |
python -m venv ./venv
source venv/bin/activate
python -m pip install --upgrade pip
pip install -e .[test]
- name: Test examples
- name: Convert example notebooks to scripts
run: |
cd docs/source/examples
python convert_notebooks.py
cd ../../..
- name: Test example scripts
run: |
source venv/bin/activate
ipython kernel install --name "venv" --user
cd tests/examples
pytest
4 changes: 2 additions & 2 deletions .github/workflows/format.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,10 @@ jobs:
uses: psf/black@stable
with:
options: "--check --verbose"
src: "./src/menelaus"
src: "./menelaus"
version: "22.3.0"

- name: Security check with bandit
run: |
# exits with code 0 if there are no errors, otherwise complains
bandit -q -r ./src/
bandit -q -r ./menelaus/
4 changes: 2 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: tests | coverage
name: tests | coverage 100%

on:
push:
Expand Down Expand Up @@ -29,5 +29,5 @@ jobs:
pip install -e .[dev]
- name: Unit and coverage tests
run: |
pytest tests/menelaus --cov=src/ --cov-report term
pytest tests/menelaus --cov=menelaus/ --cov-report term
coverage report -m --fail-under=100
30 changes: 0 additions & 30 deletions .github/workflows/update-changelog.yaml

This file was deleted.

9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
*.egg-info
build/
.pytest_cache
.python-version
.vscode
docs/build/*
dist
Expand All @@ -13,5 +14,11 @@ _build
*.coverage
*.DS_Store
.idea/
*.png

# Images

examples/*.png
menelaus/*.png
tests/*.png

*.tox*
23 changes: 22 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,25 @@ Notable changes to Menelaus will be documented here.

- Initial public release.
- Published to pypi.
- Published to readthedocs.io.
- Published to readthedocs.io.

## v0.1.2 - July 11, 2022

- Updated the documentation
- Added example jupyter notebooks to ReadTheDocs
- Switched to sphinx-bibtext for citations
- Formatting and language tweaks.
- Added StreamingDetector and BatchDetector abstract base classes.
- Re-factored kdq-tree to use new abstract base classes: the separate classes KdqTreeStreaming and KdqTreeBatch now exist.
- kdq-tree can now consume dataframes.
- Added new git workflows and improved old ones.

## v0.2.0 - December 7, 2022

- Updates to documentation.
- Updated the arguments for detector `update` and `set_reference` methods.
- Added validation to the detector `update` and `set_reference` methods.
- Added the `datasets` module, which contains or generates example datasets.
- Added implementation of Margin Density Drift Detection (MD3) semi-supervised detector.
- Added implementation of Nearest Neighbor-based Density Variation Identification (NN-DVI) data drift detector.
- Added Ensemble wrapper that allows combining two or more drift detectors.
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ include README.md
# graft tests
# graft examples
# graft docs
graft src
graft menelaus
98 changes: 66 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,7 @@ algorithms are typically used when it is more important to process large volumes
of information simultaneously, where the speed of results after receiving data
is of less concern.

In The Odyssey, Menelaus seeks a prophecy known by the shapeshifter
Proteus. Menelaus holds Proteus down as he takes the form of a lion, a
serpent, water, and so on. Eventually, Proteus relents, and Menelaus
gains the answers he sought. Accordingly, this library provides tools
for \"holding\" data as it shifts.
Menelaus is named for the Odyssean hero that defeated the shapeshifting Proteus.

# Detector List

Expand All @@ -47,15 +43,18 @@ Menelaus implements the following drift detectors.
|------------------|---------------------------------------------------------------|--------------|-----------|-------|
| Change detection | Cumulative Sum Test | CUSUM | x | |
| Change detection | Page-Hinkley | PH | x | |
| Concept drift | ADaptive WINdowing | ADWIN | x | |
| Change detection | ADaptive WINdowing | ADWIN | x | |
| Concept drift | Drift Detection Method | DDM | x | |
| Concept drift | Early Drift Detection Method | EDDM | x | |
| Concept drift | Linear Four Rates | LFR | x | |
| Concept drift | Statistical Test of Equal Proportions to Detect concept drift | STEPD | x | |
| Concept drift | Margin Density Drift Detection Method | MD3 | x | |
| Data drift | Confidence Distribution Batch Detection | CDBD | | x |
| Data drift | Hellinger Distance Drift Detection Method | HDDDM | | x |
| Data drift | kdq-Tree Detection Method | kdq-Tree | x | x |
| Data drift | PCA-Based Change Detection | PCA-CD | x | |
| Ensemble | Streaming Ensemble | - | x |
| Ensemble | Batch Ensemble | - | | x |


The three main types of detector are described below. More details, including
Expand All @@ -67,11 +66,15 @@ documentation on [ReadTheDocs](https://menelaus.readthedocs.io/en/latest/).
pre-defined range.
- Concept drift detectors monitor the performance characteristics of a
given model, trying to identify shifts in the joint distribution of
the data\'s feature values and their labels.
the data\'s feature values and their labels. Note that change detectors
can also be applied in this context.
- Data drift detectors monitor the distribution of the features; in
that sense, they are model-agnostic. Such changes in distribution
might be to single variables or to the joint distribution of all the
features.
- Ensembles are groups of detectors, where each watches the same data, and
drift is determined by combining their output. Menelaus implements a
framework for wrapping detectors this way.

The detectors may be applied in two settings, as described in the Background
section:
Expand All @@ -87,8 +90,6 @@ then maintains a count of the number of samples from the given dataset that fall
into each section of that partition. More details are given in the respective
module.

A flowchart breaking down these contexts can be found on the ReadTheDocs page under "Choosing a Detector."

# Installation

Create a virtual environment as desired, then:
Expand All @@ -99,42 +100,66 @@ pip install menelaus

# to allow editing, running tests, generating docs, etc.
# First, clone the git repo, then:
cd ./menelaus/
cd ./menelaus_clone_folder/
pip install -e .[dev]
```

Menelaus should work with Python 3.8 or higher.

# Getting Started

Each detector implements the API defined by `menelaus.drift_detector`:
they have an `update` method which allows new data to be passed, a
`drift_state` attribute which tells the user whether drift has been
detected, and a `reset` method (generally called automatically by
`update`) which clears the `drift_state` along with (usually) some other
attributes specific to the detector class.
Each detector implements the API defined by `menelaus.detector`:
notably, they have an `update` method which allows new data to be passed, and a `drift_state` attribute which tells the user whether drift has been
detected, along with (usually) other attributes specific to the detector class.

Generally, the workflow for using a detector, given some data, is as
follows:

```python
import pandas as pd
from menelaus.concept_drift import ADWIN
df = pd.read_csv('example.csv')
detector = ADWIN()
from menelaus.concept_drift import ADWINAccuracy
from menelaus.data_drift import KdqTreeStreaming
from menelaus.datasets import fetch_rainfall_data
from menelaus.ensemble import StreamingEnsemble, SimpleMajorityElection


# has feature columns, and a binary response 'rain'
df = fetch_rainfall_data()


# use a concept drift detector (response-only)
detector = ADWINAccuracy()
for i, row in df.iterrows():
detector.update(X=None, y_true=row['rain'], y_pred=0)
assert detector.drift_state != "drift", f"Drift detected in row {i}"


# use data drift detector (features-only)
detector = KdqTreeStreaming(window_size=5)
for i, row in df.iterrows():
detector.update(row['y_predicted'], row['y_true'])
if detector.drift_state is not None:
print("Drift has occurred!")
detector.update(X=df.loc[[i], df.columns != 'rain'], y_true=None, y_pred=None)
assert detector.drift_state != "drift", f"Drift detected in row {i}"


# use ensemble detector (detectors + voting function)
ensemble = StreamingEnsemble(
{
'a': ADWINAccuracy(),
'k': KdqTreeStreaming(window_size=5)
},
SimpleMajorityElection()
)

for i, row in df.iterrows():
ensemble.update(X=df.loc[[i], df.columns != 'rain'], y_true=row['rain'], y_pred=0)
assert ensemble.drift_state != "drift", f"Drift detected in row {i}"
```

For this example, because ADWIN is a concept drift detector, it requires
both a predicted value (`y_predicted`) and a true value (`y_true`), at
each update step. Note that this requirement is not true for the
detectors in other modules. More detailed examples, including code for
visualizating drift locations, may be found in the ``examples`` directory, as
stand-alone python scripts. The examples along with output can also be viewed on
the RTD website.
As a concept drift detector, ADWIN requires both a true value (`y_true`) and a
predicted value (`y_predicted`) at each update step. The data drift detector
KdqTreeStreaming only requires the feature values at each step (`X`). More
detailed examples, including code for visualizating drift locations, may be
found in the ``examples`` directory, as stand-alone python scripts. The examples
along with output can also be viewed on the RTD website.

# Contributing
Install the library using the `[dev]` option, as above.
Expand All @@ -154,6 +179,15 @@ Install the library using the `[dev]` option, as above.
sphinx-build . ../build
```

If the example notebooks for the docs need to be updated, the corresponding
python scripts in the `examples` directory should also be regenerated via:
```python
cd docs/source/examples
python convert_notebooks.py
```
Note that this will require the installation of `jupyter` and `nbconvert`,
which can be added to installation via `pip install -e ".[dev, test]"`.

- **Formatting**:

This project uses `black`, `bandit`, and `flake8` for code formatting and
Expand All @@ -162,8 +196,8 @@ Install the library using the `[dev]` option, as above.
following from the root directory:
```python
flake8 # linting
bandit -r ./src # security checks
black ./src/menelaus # formatting
bandit -r ./menelaus # security checks
black ./menelaus # formatting
```

# Copyright
Expand Down
4 changes: 0 additions & 4 deletions bin/Dockerfile

This file was deleted.

10 changes: 6 additions & 4 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@
import sys
from inspect import getsourcefile

sys.path.insert(0, os.path.abspath("../src/menelaus"))
sys.path.insert(0, os.path.abspath("../menelaus"))


# -- Project information -----------------------------------------------------

project = "menelaus"
copyright = "©2022 The MITRE Corporation. ALL RIGHTS RESERVED"
author = "Leigh Nicholl, Thomas Schill, India Lindsay, Anmol Srivastava, Kodie P McNamara, Austin Downing"
release = "0.1.0"
author = "Leigh Nicholl, Thomas Schill, India Lindsay, Anmol Srivastava, Kodie P McNamara, Shashank Jarmale"
# release = "0.1.2"


# -- General configuration ---------------------------------------------------
Expand All @@ -43,6 +43,8 @@

autodoc_default_options = {
"members": True,
"inherited-members": True,
"member-order": "groupwise",
"undoc-members": False,
"private-members": False,
"special-members": "__init__",
Expand Down Expand Up @@ -107,7 +109,7 @@ def run_apidoc(_):
os.chdir("..")
print("cwd", os.getcwd(), "\n")
print("contents", os.listdir())
src_dir = os.path.join("../src/menelaus")
src_dir = os.path.join("../menelaus")
template_dir = os.path.join("source", "templates")
main(["-M", "--templatedir", template_dir, "-f", "-o", "source", src_dir])

Expand Down
Loading

0 comments on commit 63d14ee

Please sign in to comment.