Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to PyThresh V1 #623

Open
wants to merge 2 commits into
base: development
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -328,10 +328,10 @@ A more data-based approach can be taken when setting the contamination level. By
.. code-block:: python

from pyod.models.knn import KNN
from pyod.models.thresholds import FILTER
from pyod.models.thresholds import ZSCORE

# Set the outlier detection and thresholding methods
clf = KNN(contamination=FILTER())
clf = KNN(contamination=ZSCORE())


See supported thresholding methods in `thresholding <https://github.com/yzhao062/pyod/blob/master/docs/thresholding.rst>`_.
Expand Down
2 changes: 1 addition & 1 deletion docs/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Adam Goodge (PhD Researcher @ National University of Singapore):
- Joined in 2022 (implemented LUNAR)
- `LinkedIn (Adam Goodge) <https://www.linkedin.com/in/adam-goodge-33908691/>`_

Daniel Kulik (Machine Learning Developer; MSc Student @ University of the Free State):
Daniel Kulik (Machine Learning Developer; MSc Astrophysics @ University of the Free State):

- Joined 2022 (implemented integration with PyThresh and more)
- `LinkedIn (Daniel Kulik) <https://www.linkedin.com/in/daniel-kulik-148256223>`_
Expand Down
3 changes: 1 addition & 2 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
combo
furo
geomstats
joblib
matplotlib
nose
numpy>=1.19
numba>=0.51
pyclustering
pytest
pythresh>=0.3.1
pythresh>=1.0.0
ruptures
scipy>=1.5.1
scikit-learn>=0.22.0
Expand Down
58 changes: 30 additions & 28 deletions docs/thresholding.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,32 +5,34 @@
================================== ================ ================================================================ ====================================================================================================================
Type Abbr Algorithm Documentation
================================== ================ ================================================================ ====================================================================================================================
Kernel-Based AUCP Area Under Curve Percentage `AUCP <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.AUCP>`_
Statistical Moment-Based BOOT Bootstrapping `BOOT <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.BOOT>`_
Normality-Based CHAU Chauvenet's Criterion `CHAU <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.CHAU>`_
Linear Model CLF Trained Linear Classifier `CLF <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.CLF>`_
cluster-Based CLUST Clustering Based `CLUST <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.CLUST>`_
Kernel-Based CPD Change Point Detection `CPD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.CPD>`_
Transformation-Based DECOMP Decomposition `DECOMP <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.DECOMP>`_
Normality-Based DSN Distance Shift from Normal `DSN <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.DSN>`_
Curve-Based EB Elliptical Boundary `EB <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.EB>`_
Kernel-Based FGD Fixed Gradient Descent `FGD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.FGD>`_
Filter-Based FILTER Filtering Based `FILTER <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.FILTER>`_
Curve-Based FWFM Full Width at Full Minimum `FWFM <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.FWFM>`_
Statistical Test-Based GESD Generalized Extreme Studentized Deviate `GESD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.GESD>`_
Filter-Based HIST Histogram Based `HIST <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.HIST>`_
Quantile-Based IQR Inter-Quartile Region `IQR <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.IQR>`_
Statistical Moment-Based KARCH Karcher mean (Riemannian Center of Mass) `KARCH <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.KARCH>`_
Statistical Moment-Based MAD Median Absolute Deviation `MAD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.MAD>`_
Statistical Test-Based MCST Monte Carlo Shapiro Tests `MCST <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.MCST>`_
Ensembles-Based META Meta-model Trained Classifier `META <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.META>`_
Transformation-Based MOLL Friedrichs' Mollifier `MOLL <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.MOLL>`_
Statistical Test-Based MTT Modified Thompson Tau Test `MTT <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.MTT>`_
Linear Model OCSVM One-Class Support Vector Machine `OCSVM <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.OCSVM>`_
Quantile-Based QMCD Quasi-Monte Carlo Discrepancy `QMCD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.QMCD>`_
Linear Model REGR Regression Based `REGR <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.REGR>`_
Neural Networks VAE Variational Autoencoder `VAE <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.VAE>`_
Curve-Based WIND Topological Winding Number `WIND <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.WIND>`_
Transformation-Based YJ Yeo-Johnson Transformation `YJ <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.YJ>`_
Normality-Based ZSCORE Z-score `ZSCORE <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.ZSCORE>`_
Kernel-Based AUCP Area Under Curve Percentage `AUCP <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.AUCP>`_
Statistical Moment-Based BOOT Bootstrapping `BOOT <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.BOOT>`_
Normality-Based CHAU Chauvenet's Criterion `CHAU <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.CHAU>`_
Linear Model CLF Trained Linear Classifier `CLF <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.CLF>`_
Cluster-Based CLUST Clustering Based `CLUST <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.CLUST>`_
Kernel-Based CPD Change Point Detection `CPD <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.CPD>`_
Transformation-Based DECOMP Decomposition `DECOMP <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.DECOMP>`_
Normality-Based DSN Distance Shift from Normal `DSN <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.DSN>`_
Curve-Based EB Elliptical Boundary `EB <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.EB>`_
Kernel-Based FGD Fixed Gradient Descent `FGD <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.FGD>`_
Filter-Based FILTER Filtering Based `FILTER <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.FILTER>`_
Curve-Based FWFM Full Width at Full Minimum `FWFM <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.FWFM>`_
Statistical Test-Based GAMGMM Bayesian Contamination Estimation `GAMGMM <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.GAMGMM>`_
Statistical Test-Based GESD Generalized Extreme Studentized Deviate `GESD <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.GESD>`_
Filter-Based HIST Histogram Based `HIST <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.HIST>`_
Quantile-Based IQR Inter-Quartile Region `IQR <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.IQR>`_
Statistical Moment-Based KARCH Karcher mean (Riemannian Center of Mass) `KARCH <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.KARCH>`_
Statistical Moment-Based MAD Median Absolute Deviation `MAD <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.MAD>`_
Statistical Test-Based MCST Monte Carlo Shapiro Tests `MCST <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.MCST>`_
Ensembles-Based META Meta-model Trained Classifier `META <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.META>`_
Statistical Test-Based MIXMOD Normal & Non-Normal Mixture Models `MIXMOD <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.MIXMOD>`_
Transformation-Based MOLL Friedrichs' Mollifier `MOLL <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.MOLL>`_
Statistical Test-Based MTT Modified Thompson Tau Test `MTT <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.MTT>`_
Linear Model OCSVM One-Class Support Vector Machine `OCSVM <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.OCSVM>`_
Quantile-Based QMCD Quasi-Monte Carlo Discrepancy `QMCD <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.QMCD>`_
Linear Model REGR Regression Based `REGR <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.REGR>`_
Neural Networks VAE Variational Autoencoder `VAE <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.VAE>`_
Curve-Based WIND Topological Winding Number `WIND <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.WIND>`_
Transformation-Based YJ Yeo-Johnson Transformation `YJ <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.YJ>`_
Normality-Based ZSCORE Z-score `ZSCORE <https://pyod.readthedocs.io/en/latest/pyod.models.html#pyod.models.thresholds.ZSCORE>`_
================================== ================ ================================================================ ====================================================================================================================
7 changes: 4 additions & 3 deletions pyod/models/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ def predict(self, X, return_confidence=False):

# if this is a PyThresh object
else:
prediction = self.contamination.eval(pred_score)
prediction = self.contamination.predict(pred_score)

if return_confidence:
confidence = self.predict_confidence(X)
Expand Down Expand Up @@ -290,7 +290,7 @@ def predict_confidence(self, X):
prediction = (test_scores > self.threshold_).astype('int').ravel()
# if this is a PyThresh object
else:
prediction = self.contamination.eval(test_scores)
prediction = self.contamination.predict(test_scores)
np.place(confidence, prediction == 0, 1 - confidence[prediction == 0])

return confidence
Expand Down Expand Up @@ -574,7 +574,8 @@ def _process_decision_scores(self):

# if this is a PyThresh object
else:
self.labels_ = self.contamination.eval(self.decision_scores_)
self.contamination.fit(self.decision_scores_)
self.labels_ = self.contamination.labels_
self.threshold_ = self.contamination.thresh_
if not self.threshold_:
self.threshold_ = np.sum(self.labels_) / len(self.labels_)
Expand Down
Loading
Loading