Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature LOF outlier detector #746

Merged
merged 285 commits into from
Jun 12, 2023
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
285 commits
Select commit Hold shift + click to select a range
09412c4
Update default knn ensemble aggregator and normalizer values
mauicv Jan 3, 2023
65fbeac
Add tests for aggregator and normalizer default values
mauicv Jan 4, 2023
633636a
Merge branch 'master' into feature/knn-outlier-detector
mauicv Jan 4, 2023
4cd2eba
Remove Optional type from aggregator
mauicv Jan 4, 2023
98d2ba5
Change X -> x throughout
mauicv Jan 4, 2023
3d57b6d
Change anomaly -> outlier throughout
mauicv Jan 4, 2023
bc01ddf
Improve fpr description
mauicv Jan 4, 2023
47566b8
Update PValNormalizer docstring
mauicv Jan 4, 2023
bda15fd
Add custom error types
mauicv Jan 4, 2023
bacbfcc
Test pval and shift and scale normalizer output values
mauicv Jan 4, 2023
ea436b0
Test aggregator output values
mauicv Jan 4, 2023
c3af856
Remove unneeded NotImplemnentedErrors from ABC abstract methods
mauicv Jan 4, 2023
351e5a7
Fix typos
mauicv Jan 4, 2023
a190cc3
Fix method typo
mauicv Jan 4, 2023
0daa98f
Fix docstrings for KNNTorch
mauicv Jan 4, 2023
8d2fa3b
Set api signatures to accept np.ndarray and not List types
mauicv Jan 4, 2023
af8ab43
Fix mypy error
mauicv Jan 4, 2023
89c4815
Move to numpy logic from OutlierDetectorOutput dataclass to base class
mauicv Jan 5, 2023
c909f4c
Refactor init knn logic
mauicv Jan 5, 2023
bc7a771
Refator str to aggregator and normalizer methods to backend
mauicv Jan 5, 2023
dfd613d
Align kNN output with other outlier detectors
mauicv Jan 6, 2023
3fb167b
Refactor backend.pytorch into pytorch module
mauicv Jan 6, 2023
f28cb0f
Fix optional dependency tests
mauicv Jan 6, 2023
6ef8c62
Add backticks do docstrings
mauicv Jan 10, 2023
76b31d7
Update docstrings
mauicv Jan 11, 2023
6fb600c
reword numpy to torch tensor in transform object docstrings
mauicv Jan 11, 2023
cf6bba1
Update return type hints
mauicv Jan 11, 2023
f8ae93d
Add Mahalanobis detector
mauicv Jan 16, 2023
96f9be0
Add hasattr check in _accumulator method
mauicv Jan 16, 2023
fa4f944
Merge branch 'feature/knn-outlier-detector' into feature/mahalanobis-od
mauicv Jan 16, 2023
e936b36
Add singlular dispatch pattern for _to_numpy method
mauicv Jan 16, 2023
d297082
Merge branch 'feature/knn-outlier-detector' into feature/mahalanobis-od
mauicv Jan 16, 2023
1bf5fdc
Add Mahalanobis tests
mauicv Jan 17, 2023
a6ce6a2
Fix flake8 error
mauicv Jan 17, 2023
babc083
Add MahalanobisTorch to test_dep_management tests
mauicv Jan 17, 2023
ce28b8e
Replace alibi_detect.utils._types imports with typing_extension
mauicv Jan 17, 2023
4830db3
Merge branch 'feature/knn-outlier-detector' into feature/mahalanobis-od
mauicv Jan 17, 2023
7fe67f1
Minor change
mauicv Jan 17, 2023
789a1d3
Replace singular_dispatch_method with singular_dispatch
mauicv Jan 17, 2023
55e95d6
Merge branch 'feature/knn-outlier-detector' into feature/mahalanobis-od
mauicv Jan 17, 2023
fe7c185
Update changed _to_numpy logic
mauicv Jan 17, 2023
ada292e
Rename test files
mauicv Jan 17, 2023
31dbed1
Improove docstrings for mahalanobis detector
mauicv Jan 18, 2023
2d7bdeb
Merge branch 'master' into feature/knn-outlier-detector
mauicv Jan 18, 2023
f86ba80
Merge branch 'feature/knn-outlier-detector' into feature/mahalanobis-od
mauicv Jan 18, 2023
7bc86dc
Merge branch 'master' into feature/knn-outlier-detector
mauicv Jan 20, 2023
c865d4c
Rename new mahalanobis -> _mahalanobis
mauicv Jan 20, 2023
66a5826
Rename Mahalanobis detector symbol
mauicv Jan 20, 2023
3e9877c
Merge branch 'feature/knn-outlier-detector' into feature/mahalanobis-od
mauicv Jan 20, 2023
352e788
Add pca detector
mauicv Jan 20, 2023
8c5668f
Add make_moons integration tests
mauicv Jan 23, 2023
4161dff
Add optional dependency functionality
mauicv Jan 23, 2023
f357bbe
Add docstrings for pca outlier detector
mauicv Jan 23, 2023
40a0b27
Fix minor PR suggested changes
mauicv Jan 24, 2023
a831642
Make knn object private
mauicv Jan 24, 2023
aa334bd
Improve the kNN detector docstrings
mauicv Jan 24, 2023
d06673f
Rename aggregator to ensembler
mauicv Jan 24, 2023
0c674f0
Merge branch 'master' into feature/knn-outlier-detector
mauicv Jan 24, 2023
af30234
Merge branch 'feature/knn-outlier-detector' into feature/mahalanobis-od
mauicv Jan 24, 2023
616ece2
Merge branch 'feature/mahalanobis-od' into feature/pca-od
mauicv Jan 24, 2023
db524ab
Remove unnessesery code
mauicv Jan 24, 2023
3d69469
Add experimental module
mauicv Jan 31, 2023
5934ef9
Merge branch 'feature/knn-outlier-detector' into feature/mahalanobis-od
mauicv Jan 31, 2023
d7a9b82
Add Mahalanobis to experiemental namespace
mauicv Jan 31, 2023
6391c80
Merge branch 'feature/mahalanobis-od' into feature/pca-od
mauicv Jan 31, 2023
35afa55
Add PCA to the experimental namespace
mauicv Jan 31, 2023
2fad9e3
Add sklearn gmm od backend
mauicv Jan 31, 2023
c679ab0
Add gmm pytorch backend
mauicv Feb 1, 2023
7826155
Add _gmm tests
mauicv Feb 1, 2023
901ad53
Add _to_numpy as a static method on base backend class
mauicv Feb 1, 2023
9439f38
Merge branch 'feature/knn-outlier-detector' into feature/mahalanobis-od
mauicv Feb 1, 2023
a1b0a0d
Refactor mahalanobis to use staticmethod _to_numpy
mauicv Feb 1, 2023
d0bb206
Merge branch 'feature/mahalanobis-od' into feature/pca-od
mauicv Feb 1, 2023
09f8faa
Refactor PCA to use staticmethod _to_numpy
mauicv Feb 1, 2023
dc8d8bb
Merge branch 'feature/pca-od' into feature/gmm-od
mauicv Feb 1, 2023
27230e3
Add GMM to experimental namespace
mauicv Feb 1, 2023
d64c6b9
Add args param to fit
mauicv Feb 1, 2023
5c56dc2
Make args kwargs
mauicv Feb 1, 2023
170a55f
Fix minor issue
mauicv Feb 1, 2023
5a92bb0
Fix typing issues
mauicv Feb 2, 2023
90318e2
Import mahalanobis from module __init__
mauicv Feb 2, 2023
8e7762c
Merge branch 'feature/mahalanobis-od' into feature/pca-od
mauicv Feb 2, 2023
06340c5
Merge branch 'feature/pca-od' into feature/gmm-od
mauicv Feb 2, 2023
7b939fa
Add device logic to gmm torch
mauicv Feb 2, 2023
0f415b6
Minor change to docstring
mauicv Feb 2, 2023
049f93f
Remove OutlierDetector
mauicv Feb 3, 2023
c1a2ea8
Add comments to test and remove duplicate test
mauicv Feb 3, 2023
885201f
Add further comments to _knn tests
mauicv Feb 3, 2023
8667af5
Correct method name spelling
mauicv Feb 3, 2023
fb7584a
Fix return in docstring
mauicv Feb 3, 2023
cddb470
Add no grad decorator to backend methods
mauicv Feb 3, 2023
af9730e
Merge branch 'feature/knn-outlier-detector' into feature/mahalanobis-od
mauicv Feb 6, 2023
9105843
Add minor fixes from merging knn branch
mauicv Feb 6, 2023
4196834
Merge branch 'feature/mahalanobis-od' into feature/pca-od
mauicv Feb 6, 2023
5ade8e7
Minor fix
mauicv Feb 6, 2023
27f7f90
Merge branch 'feature/mahalanobis-od' into feature/pca-od
mauicv Feb 6, 2023
cae15ac
Add fixes for merged pca branch
mauicv Feb 6, 2023
89aece2
Merge branch 'feature/pca-od' into feature/gmm-od
mauicv Feb 6, 2023
2899c90
Fix merge issues from pca
mauicv Feb 6, 2023
6af7d37
Fix minor linting issue
mauicv Feb 6, 2023
04da0a2
Use docstring in test instead of comments
mauicv Feb 13, 2023
a677e30
Address minor pr comments
mauicv Feb 13, 2023
f54586a
Add kwarg formatting for fit method parameters on gmm
mauicv Feb 14, 2023
5bede8a
Add method docstrings for GMM detector
mauicv Feb 14, 2023
57dfd9d
Add torchscript test and comments
mauicv Feb 14, 2023
b8c08eb
Add further documentation for tests
mauicv Feb 14, 2023
b227db4
Remove singular dispatch pattern
mauicv Feb 15, 2023
705bc95
Merge branch 'master' into feature/knn-outlier-detector
mauicv Feb 15, 2023
e48a5e1
Merge branch 'feature/knn-outlier-detector' into feature/mahalanobis-od
mauicv Feb 15, 2023
2ba3a46
Merge branch 'feature/mahalanobis-od' into feature/pca-od
mauicv Feb 15, 2023
d6a8486
Merge branch 'feature/pca-od' into feature/gmm-od
mauicv Feb 15, 2023
94dd9c8
Add pytorch backend for LOF detector
mauicv Feb 21, 2023
7382a01
Add kernel option to lof detector pytorch backend
mauicv Feb 21, 2023
c50cb47
Add _lof outlier detector frontend
mauicv Feb 21, 2023
e53d4d1
Add device logic
mauicv Feb 21, 2023
c86d0ad
Fix typing errors
mauicv Feb 21, 2023
46bb747
Fit ensembler in infer_threshold step not fit step
mauicv Feb 22, 2023
a3e89e3
Add further documentation to knn detector
mauicv Feb 22, 2023
b08572d
Refactor exceptions into seperate file and add base class
mauicv Feb 28, 2023
70921b7
Rename exceptions to be consistent with alibi
mauicv Feb 28, 2023
87c8ab7
Remove __future__ annotations imports and unused _types imports
mauicv Mar 7, 2023
21a5fc0
Add link from base protocols to transform docstrings
mauicv Mar 7, 2023
75a7fb6
Rename transform_protocols and others to PascalCase
mauicv Mar 7, 2023
f4a13d3
Remove private methods from public __init__
mauicv Mar 7, 2023
186cde2
Fix missing comma bug
mauicv Mar 7, 2023
adb1f51
Add metion about torch device in KNNTorch and KNN docstrings
mauicv Mar 7, 2023
789b922
Make argument captialization consitent
mauicv Mar 7, 2023
bdaf4d4
Add missing raise statment
mauicv Mar 7, 2023
d539063
Remove constructors from mixins
mauicv Mar 8, 2023
7633d8b
Change code formatting to be more readable
mauicv Mar 8, 2023
fc713c3
Revert "Remove constructors from mixins"
mauicv Mar 8, 2023
1e3d4d8
Remove constructors from FitMixinTorch
mauicv Mar 8, 2023
ea20b21
Remove private method pattern in ensemble mixins
mauicv Mar 8, 2023
b1302dd
Fix spelling mistakes
mauicv Mar 8, 2023
09cf9f7
Add np.isclose for sum to one check
mauicv Mar 8, 2023
37025ed
Fix mutable default issue
mauicv Mar 8, 2023
b48b0a0
Expose _knn docstrings in the experimental namespace
mauicv Mar 9, 2023
8e423a0
Fix minor spelling mistake
mauicv Mar 9, 2023
e0deeaa
Add self return types
mauicv Mar 14, 2023
0a4046a
Make fit an abstract method on FitMixin
mauicv Mar 14, 2023
c4c7004
Add tests to check correct errors raised in KNNTorch backend
mauicv Mar 14, 2023
ffffbe8
Fix fxiture scope error in tests
mauicv Mar 14, 2023
02657ec
Catch and throw less confusing errors from backend components
mauicv Mar 15, 2023
5b754cd
Remove return self statments for consistency with old detectors
mauicv Mar 15, 2023
912afbd
Reword docstring for _catch_error dectorator
mauicv Mar 15, 2023
d880f92
Remove autodoc comment
mauicv Mar 15, 2023
8d98ee2
Add value error for invalid choice of fpr
mauicv Mar 16, 2023
95d0e3c
Set default PValNormalizer
mauicv Mar 16, 2023
12d22f7
Cast outlier booleans to ints
mauicv Mar 16, 2023
767aff9
Rewrite the 1st paragraph of the knn detector docstring
mauicv Mar 16, 2023
467aa6a
Minor change
mauicv Mar 16, 2023
d57d1bd
Add docstrings for the ensemble tests
mauicv Mar 16, 2023
a535e8d
Rename x_ref to x in infer_threshold
mauicv Mar 17, 2023
8a8e212
Merge branch 'feature/knn-outlier-detector' into feature/mahalanobis-od
mauicv Mar 17, 2023
176a290
Merge branch 'master' into feature/mahalanobis-od
mauicv Mar 17, 2023
58d4f21
Fix failing tests
mauicv Mar 17, 2023
1f5be2f
Minor changes to align mahalanobis and kNN work
mauicv Mar 20, 2023
b573bea
Revert renaming change
mauicv Mar 20, 2023
dc0f2a8
Further alignment changes
mauicv Mar 20, 2023
004a19c
Minor change to Mahanobis backend docstrings
mauicv Mar 20, 2023
a567b93
Rewrite docstrings for mahalanobis detector
mauicv Mar 20, 2023
f9b023f
Add docstrings for tests
mauicv Mar 20, 2023
0cbe044
Add docstrings for mahalanobis backend tests
mauicv Mar 20, 2023
eecaa4b
Minor change
mauicv Mar 20, 2023
858d8ec
Make X in detector private methods lowercase
mauicv Mar 20, 2023
8d10545
Change ensemble=None to ensemble=False in mahalanobis detector torch …
mauicv Apr 6, 2023
70dc0c6
Make set_fitted private
mauicv Apr 6, 2023
76778c9
Use super in MahalanobisDetector
mauicv Apr 6, 2023
6fad014
Reorder docstring sections to be compatible with numpy conventions
mauicv Apr 6, 2023
8245edb
Minor docstring changes
mauicv Apr 6, 2023
57abc3d
Remove cpu method call on tensor
mauicv Apr 6, 2023
f09979d
Remove cpu method call on tensor 2
mauicv Apr 6, 2023
c169089
Remove as_tensor method on input to score
mauicv Apr 6, 2023
e11bf5b
Change principle -> principal
mauicv Apr 6, 2023
8dd6085
Merge branch 'feature/mahalanobis-od' into feature/pca-od
mauicv Apr 13, 2023
35a27c3
Merge branch 'master' into feature/pca-od
mauicv Apr 13, 2023
3fe923c
Fix broken tests
mauicv Apr 19, 2023
4061365
Fix remaining tests
mauicv Apr 19, 2023
54aecfd
Fix linting error
mauicv Apr 19, 2023
e3de2a4
Minor fix in _pca
mauicv Apr 19, 2023
b415372
MInor changes
mauicv Apr 19, 2023
e73cbdc
Surface correct errors
mauicv Apr 19, 2023
94e95e4
Fix tests for error surfacing
mauicv Apr 19, 2023
f5c67cc
Minor error fix
mauicv Apr 19, 2023
67cb670
Update tests and add docstrings
mauicv Apr 19, 2023
c5631eb
Add save scrited detectors to temp_file in integration tests
mauicv Apr 19, 2023
e4a1798
Add docstrings fro backend pca tests
mauicv Apr 19, 2023
7378f69
Minor fix
mauicv Apr 20, 2023
944d5ef
Update docstrings on PCA detector
mauicv Apr 20, 2023
278b619
Revert minor change
mauicv Apr 20, 2023
83d8346
Update pca backend docstrings
mauicv Apr 20, 2023
189d3cc
Merge branch 'master' into feature/pca-od
mauicv Apr 20, 2023
fb6bb18
Add and raise errors for incorrect n_components values
mauicv Apr 26, 2023
e2d6224
Fix inccorect docstrings for pca
mauicv Apr 26, 2023
cd99ec2
Fix minor issue
mauicv Apr 26, 2023
2cb7796
Fix incorrect detector name in docstring
mauicv Apr 26, 2023
3ccca07
Remove redundant storage and computation in fit step
mauicv Apr 26, 2023
d2175dd
Rename incorrectly named save files in tests
mauicv Apr 26, 2023
abd270d
Fix issue in centering simialrity matrix
mauicv Apr 26, 2023
90bc9fc
Minor fix
mauicv Apr 26, 2023
e329b90
Merge branch 'master' into feature/pca-od
mauicv Apr 26, 2023
665c8ae
Update device types
mauicv Apr 27, 2023
7eefb93
Import Literals from typing_extensions instead of internal types
mauicv Apr 27, 2023
c4b3719
Remove cpu calls from backends
mauicv Apr 27, 2023
3ea402d
Rename compute_score to _score
mauicv Apr 27, 2023
3ceffbc
Change principle -> principal
mauicv Apr 27, 2023
ef5a094
Fix minor typing issue
mauicv Apr 27, 2023
a4c06f5
Merge branch 'feature/pca-od' into feature/gmm-od
mauicv May 2, 2023
5857c13
Merge branch 'master' into feature/gmm-od
mauicv May 2, 2023
30eff38
Fix tests
mauicv May 2, 2023
9994169
Update sklearn backend base class to match torch base class
mauicv May 2, 2023
49670aa
Update sklearn gmm implementation
mauicv May 2, 2023
b93dcb7
Fix mypy errors
mauicv May 2, 2023
4e5aa4b
Catch n_components < 1 error
mauicv May 2, 2023
3481e0c
Fix tests
mauicv May 3, 2023
e43dc34
Fix mypy issue in np.quantile
mauicv May 3, 2023
2cc13b1
Fix optional dependencies issue
mauicv May 3, 2023
81b7c77
Update docstrings on gmm backends
mauicv May 3, 2023
ea5e117
Update docstrings for sklearn base OD class
mauicv May 3, 2023
4fb2131
Update _gmm docstrings
mauicv May 3, 2023
482e738
Minor fix
mauicv May 3, 2023
6dde34c
Add docstrings for GMModel
mauicv May 3, 2023
14b7175
Make requested changes
mauicv May 11, 2023
f9326ed
Fix typo
mauicv May 11, 2023
95645c6
Make pr requested changes
mauicv May 11, 2023
85fa952
Add convergence checks in gmm pytorch backend fit method
mauicv May 11, 2023
0cb20d7
Rename epochs to max_epochs in pytorch fit method
mauicv May 11, 2023
a38288e
Fix minor typing error
mauicv May 11, 2023
e75058b
Fix py3.7 typing issue
mauicv May 15, 2023
246def0
Change fit arg order
mauicv May 15, 2023
1d98e80
Fix type
mauicv May 15, 2023
dd298f8
Minor change
mauicv May 15, 2023
95f1006
Change max_epochs docstring
mauicv May 15, 2023
f3e4d15
Update docstring
mauicv May 15, 2023
e13b9a7
Merge branch 'feature/gmm-od' into feature/lof-od
mauicv May 15, 2023
b75eb04
Merge branch 'master' into feature/lof-od
mauicv May 15, 2023
37ab4e8
Fix merge import errors
mauicv May 15, 2023
2af6641
Fix tests and typing
mauicv May 16, 2023
6409bb7
Update tests
mauicv May 17, 2023
023b4c3
Update optional dep tests for LOFTorch
mauicv May 22, 2023
a88630a
Fix issues in _lof
mauicv May 22, 2023
c13572d
Add comments to fit and score logic
mauicv May 24, 2023
6000309
Remove shape comments
mauicv May 24, 2023
3f091ad
Fix tests
mauicv May 24, 2023
d52c2f6
Update lof docstrings
mauicv May 24, 2023
b7b4a66
Update docstrings for lof backend
mauicv May 24, 2023
b8842bd
Minor change
mauicv May 24, 2023
5cec76d
Update score docstring
mauicv Jun 12, 2023
32de716
Update _compute_K docstring
mauicv Jun 12, 2023
7610baa
Merge branch 'master' into feature/lof-od
mauicv Jun 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
216 changes: 216 additions & 0 deletions alibi_detect/od/_lof.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
from typing import Callable, Union, Optional, Dict, Any, List, Tuple
from typing import TYPE_CHECKING
from typing_extensions import Literal

import numpy as np

from alibi_detect.base import outlier_prediction_dict
from alibi_detect.exceptions import _catch_error as catch_error
from alibi_detect.od.base import TransformProtocol, TransformProtocolType
from alibi_detect.base import BaseDetector, FitMixin, ThresholdMixin
from alibi_detect.od.pytorch import LOFTorch, Ensembler
from alibi_detect.od.base import get_aggregator, get_normalizer, NormalizerLiterals, AggregatorLiterals
from alibi_detect.utils.frameworks import BackendValidator
from alibi_detect.version import __version__


if TYPE_CHECKING:
import torch


backends = {
'pytorch': (LOFTorch, Ensembler)
}


class LOF(BaseDetector, FitMixin, ThresholdMixin):
def __init__(
self,
k: Union[int, np.ndarray, List[int], Tuple[int]],
kernel: Optional[Callable] = None,
normalizer: Optional[Union[TransformProtocolType, NormalizerLiterals]] = 'PValNormalizer',
aggregator: Union[TransformProtocol, AggregatorLiterals] = 'AverageAggregator',
backend: Literal['pytorch'] = 'pytorch',
device: Optional[Union[Literal['cuda', 'gpu', 'cpu'], 'torch.device']] = None,
) -> None:
"""
Local Outlier Factor (LOF) outlier detector.

The LOF detector is a non-parametric method for outlier detection. It computes the local density
deviation of a given data point with respect to its neighbors. It considers as outliers the
samples that have a substantially lower density than their neighbors.

The detector can be initialized with `k` a single value or an array of values. If `k` is a single value then
the score method uses the distance/kernel similarity to the k-th nearest neighbor. If `k` is an array of
values then the score method uses the distance/kernel similarity to each of the specified `k` neighbors.
In the latter case, an `aggregator` must be specified to aggregate the scores.

Note that, in the multiple k case, a normalizer can be provided. If a normalizer is passed then it is fit in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be a little clearer as to what the normalizer and aggregator refer to as it's not clear here or from the kwarg descriptions. I realise this applied to KNN too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I've opened an issue here. I'll include it in a final clean up PR i think

the `infer_threshold` method and so this method must be called before the `predict` method. If this is not
done an exception is raised. If `k` is a single value then the predict method can be called without first
calling `infer_threshold` but only scores will be returned and not outlier predictions.

Parameters
----------
k
Number of nearest neighbors to compute distance to. `k` can be a single value or
an array of integers. If an array is passed, an aggregator is required to aggregate
the scores. If `k` is a single value we compute the local outlier factor for that `k`.
Otherwise if `k` is a list then we compute and aggregate the local outlier factor for each
value in `k`.
kernel
Kernel function to use for outlier detection. If ``None``, `torch.cdist` is used.
Otherwise if a kernel is specified then instead of using `torch.cdist` the kernel
defines the k nearest neighbor distance.
normalizer
Normalizer to use for outlier detection. If ``None``, no normalization is applied.
For a list of available normalizers, see :mod:`alibi_detect.od.pytorch.ensemble`.
aggregator
Aggregator to use for outlier detection. Can be set to ``None`` if `k` is a single
value. For a list of available aggregators, see :mod:`alibi_detect.od.pytorch.ensemble`.
backend
Backend used for outlier detection. Defaults to ``'pytorch'``. Options are ``'pytorch'``.
device
Device type used. The default tries to use the GPU and falls back on CPU if needed.
Can be specified by passing either ``'cuda'``, ``'gpu'``, ``'cpu'`` or an instance of
``torch.device``.

Raises
------
ValueError
If `k` is an array and `aggregator` is None.
NotImplementedError
If choice of `backend` is not implemented.
"""
super().__init__()

backend_str: str = backend.lower()
BackendValidator(
backend_options={'pytorch': ['pytorch']},
construct_name=self.__class__.__name__
).verify_backend(backend_str)

backend_cls, ensembler_cls = backends[backend]
ensembler = None

if aggregator is None and isinstance(k, (list, np.ndarray, tuple)):
raise ValueError('If `k` is a `np.ndarray`, `list` or `tuple`, '
'the `aggregator` argument cannot be ``None``.')

if isinstance(k, (list, np.ndarray, tuple)):
ensembler = ensembler_cls(
normalizer=get_normalizer(normalizer),
aggregator=get_aggregator(aggregator)
)

self.backend = backend_cls(k, kernel=kernel, ensembler=ensembler, device=device)

# set metadata
self.meta['detector_type'] = 'outlier'
self.meta['data_type'] = 'numeric'
Copy link
Contributor

@ascillitoe ascillitoe May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not isolated to this PR, but noting that we seem to be a little inconsistent across the new and old outlier detectors wrt to when data_type is hard-coded, and when it is optionally set via a kwarg. For some, it is hardcoded to time-series (which makes sense), for some (e.g. the old Mahalanobis) it is set via kwarg, and for some it is hard coded to numeric. Maybe worth opening an issue to review this more generally?

Already mentioned in #567 (comment), but highlighting here since we are setting data_type in new detectors too...

self.meta['online'] = False

def fit(self, x_ref: np.ndarray) -> None:
"""Fit the detector on reference data.

Parameters
----------
x_ref
Reference data used to fit the detector.
"""
self.backend.fit(self.backend._to_tensor(x_ref))

@catch_error('NotFittedError')
@catch_error('ThresholdNotInferredError')
def score(self, x: np.ndarray) -> np.ndarray:
"""Score `x` instances using the detector.

Computes the local outlier factor for each instance in `x`. If `k` is an array of values then the score for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth just noting here that the outlier factor is the density of each instance in x relative to those of its neighbours in x_ref.

each `k` is aggregated using the ensembler.

Parameters
----------
x
Data to score. The shape of `x` should be `(n_instances, n_features)`.
ascillitoe marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
Outlier scores. The shape of the scores is `(n_instances,)`. The higher the score, the more anomalous the \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: In a few places outlying is used e.g.

the l2-norm of the projected data. The higher the score, the more outlying the instance.

whereas in the score docstring (and for _pca, _gmm, _knn, mahalanobis) anomalous is used. Worth picking one or the other?

instance.

Raises
------
NotFittedError
If called before detector has been fit.
ThresholdNotInferredError
If k is a list and a threshold was not inferred.
"""
score = self.backend.score(self.backend._to_tensor(x))
score = self.backend._ensembler(score)
return self.backend._to_numpy(score)

@catch_error('NotFittedError')
def infer_threshold(self, x: np.ndarray, fpr: float) -> None:
"""Infer the threshold for the LOF detector.

The threshold is computed so that the outlier detector would incorrectly classify `fpr` proportion of the
reference data as outliers.

Parameters
----------
x
Reference data used to infer the threshold.
fpr
False positive rate used to infer the threshold. The false positive rate is the proportion of
instances in `x` that are incorrectly classified as outliers. The false positive rate should
be in the range ``(0, 1)``.

Raises
------
ValueError
Raised if `fpr` is not in ``(0, 1)``.
NotFittedError
If called before detector has been fit.
"""
self.backend.infer_threshold(self.backend._to_tensor(x), fpr)

@catch_error('NotFittedError')
@catch_error('ThresholdNotInferredError')
def predict(self, x: np.ndarray) -> Dict[str, Any]:
"""Predict whether the instances in `x` are outliers or not.

Scores the instances in `x` and if the threshold was inferred, returns the outlier labels and p-values as well.

Parameters
----------
x
Data to predict. The shape of `x` should be `(n_instances, n_features)`.

Returns
-------
Dictionary with keys 'data' and 'meta'. 'data' contains the outlier scores. If threshold inference was \
performed, 'data' also contains the threshold value, outlier labels and p-vals . The shape of the scores is \
`(n_instances,)`. The higher the score, the more anomalous the instance. 'meta' contains information about \
the detector.

Raises
------
NotFittedError
If called before detector has been fit.
ThresholdNotInferredError
If k is a list and a threshold was not inferred.
"""
outputs = self.backend.predict(self.backend._to_tensor(x))
output = outlier_prediction_dict()
output['data'] = {
**output['data'],
**self.backend._to_numpy(outputs)
}
output['meta'] = {
**output['meta'],
'name': self.__class__.__name__,
'detector_type': 'outlier',
'online': False,
'version': __version__,
}
return output
1 change: 1 addition & 0 deletions alibi_detect/od/pytorch/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from alibi_detect.utils.missing_optional_dependency import import_optional

KNNTorch = import_optional('alibi_detect.od.pytorch.knn', ['KNNTorch'])
LOFTorch = import_optional('alibi_detect.od.pytorch.lof', ['LOFTorch'])
MahalanobisTorch = import_optional('alibi_detect.od.pytorch.mahalanobis', ['MahalanobisTorch'])
KernelPCATorch, LinearPCATorch = import_optional('alibi_detect.od.pytorch.pca', ['KernelPCATorch', 'LinearPCATorch'])
Ensembler = import_optional('alibi_detect.od.pytorch.ensemble', ['Ensembler'])
Expand Down
164 changes: 164 additions & 0 deletions alibi_detect/od/pytorch/lof.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
from typing import Optional, Union, List, Tuple
from typing_extensions import Literal
import numpy as np
import torch

from alibi_detect.od.pytorch.ensemble import Ensembler
from alibi_detect.od.pytorch.base import TorchOutlierDetector


class LOFTorch(TorchOutlierDetector):
def __init__(
self,
k: Union[np.ndarray, List, Tuple, int],
kernel: Optional[torch.nn.Module] = None,
ensembler: Optional[Ensembler] = None,
device: Optional[Union[Literal['cuda', 'gpu', 'cpu'], 'torch.device']] = None,
):
"""PyTorch backend for LOF detector.

Parameters
----------
k
Number of nearest neighbors used to compute the local outlier factor. `k` can be a single
value or an array of integers. If `k` is a single value the score method uses the
distance/kernel similarity to the `k`-th nearest neighbor. If `k` is a list then it uses
the distance/kernel similarity to each of the specified `k` neighbors.
kernel
If a kernel is specified then instead of using `torch.cdist` the kernel defines the `k` nearest
neighbor distance.
ensembler
If `k` is an array of integers then the ensembler must not be ``None``. Should be an instance
of :py:obj:`alibi_detect.od.pytorch.ensemble.ensembler`. Responsible for combining
multiple scores into a single score.
device
Device type used. The default tries to use the GPU and falls back on CPU if needed.
Can be specified by passing either ``'cuda'``, ``'gpu'``, ``'cpu'`` or an instance of
``torch.device``.
"""
TorchOutlierDetector.__init__(self, device=device)
self.kernel = kernel
self.ensemble = isinstance(k, (np.ndarray, list, tuple))
self.ks = torch.tensor(k) if self.ensemble else torch.tensor([k], device=self.device)
self.ensembler = ensembler

def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Detect if `x` is an outlier.

Parameters
----------
x
`torch.Tensor` with leading batch dimension.

Returns
-------
`torch.Tensor` of ``bool`` values with leading batch dimension.

Raises
------
ThresholdNotInferredError
If called before detector has had `infer_threshold` method called.
"""
raw_scores = self.score(x)
scores = self._ensembler(raw_scores)
if not torch.jit.is_scripting():
self.check_threshold_inferred()
preds = scores > self.threshold
return preds

def _make_mask(self, reachabilities: torch.Tensor):
"""Generate a mask for computing the average reachability.

If k is an array then we need to compute the average reachability for each k separately. To do
this we use a mask to weight the reachability of each k-close neighbor by 1/k and the rest to 0.
"""
mask = torch.zeros_like(reachabilities[0], device=self.device)
for i, k in enumerate(self.ks):
mask[:k, i] = torch.ones(k, device=self.device)/k
return mask

def _compute_K(self, x, y):
"""Compute the distance/similarity matrix matrix between `x` and `y`."""
Copy link
Contributor

@ojcobb ojcobb May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove "/similarity" here? A similarity matrix would have entries that icnrease with similarity, whereas this is the opposite

return torch.exp(-self.kernel(x, y)) if self.kernel is not None else torch.cdist(x, y)

def score(self, x: torch.Tensor) -> torch.Tensor:
"""Computes the score of `x`

Parameters
----------
x
The tensor of instances. First dimension corresponds to batch.

Returns
-------
Tensor of scores for each element in `x`.

Raises
------
NotFittedError
If called before detector has been fit.
"""
self.check_fitted()

# compute the distance matrix between x and x_ref
K = self._compute_K(x, self.x_ref)

# compute k nearest neighbors for maximum k in self.ks
max_k = torch.max(self.ks)
bot_k_items = torch.topk(K, int(max_k), dim=1, largest=False)
bot_k_inds, bot_k_dists = bot_k_items.indices, bot_k_items.values

# To compute the reachabilities we get the k-distances of each object in the instances
# k nearest neighbors. Then we take the maximum of their k-distances and the distance
# to the instance.
lower_bounds = self.knn_dists_ref[bot_k_inds]
reachabilities = torch.max(bot_k_dists[:, :, None], lower_bounds)

# Compute the average reachability for each instance. We use a mask to manage each k in
# self.ks separately.
mask = self._make_mask(reachabilities)
avg_reachabilities = (reachabilities*mask[None, :, :]).sum(1)

# Compute the LOF score for each instance. Note we don't take 1/avg_reachabilities as
# avg_reachabilities is the denominator in the LOF formula.
factors = (self.ref_inv_avg_reachabilities[bot_k_inds] * mask[None, :, :]).sum(1)
lofs = (avg_reachabilities * factors)
return lofs if self.ensemble else lofs[:, 0]

def fit(self, x_ref: torch.Tensor):
"""Fits the detector

Parameters
----------
x_ref
The Dataset tensor.
"""
# compute the distance matrix
K = self._compute_K(x_ref, x_ref)
# set diagonal to max distance to prevent torch.topk from returning the instance itself
K += torch.eye(len(K), device=self.device) * torch.max(K)

# compute k nearest neighbors for maximum k in self.ks
max_k = torch.max(self.ks)
bot_k_items = torch.topk(K, int(max_k), dim=1, largest=False)
bot_k_inds, bot_k_dists = bot_k_items.indices, bot_k_items.values

# store the k-distances for each instance for each k.
self.knn_dists_ref = bot_k_dists[:, self.ks-1]

# To compute the reachabilities we get the k-distances of each object in the instances
# k nearest neighbors. Then we take the maximum of their k-distances and the distance
# to the instance.
lower_bounds = self.knn_dists_ref[bot_k_inds]
reachabilities = torch.max(bot_k_dists[:, :, None], lower_bounds)

# Compute the average reachability for each instance. We use a mask to manage each k in
# self.ks separately.
mask = self._make_mask(reachabilities)
avg_reachabilities = (reachabilities*mask[None, :, :]).sum(1)

# Compute the inverse average reachability for each instance.
self.ref_inv_avg_reachabilities = 1/avg_reachabilities

self.x_ref = x_ref
self._set_fitted()
Loading