Skip to content

Commit

Permalink
Merge pull request #365 from yzhao062/development
Browse files Browse the repository at this point in the history
V0.9.7 Add ECOD
  • Loading branch information
yzhao062 authored Jan 4, 2022
2 parents 7aeefcf + fdb9b57 commit 13b0cd5
Show file tree
Hide file tree
Showing 9 changed files with 669 additions and 14 deletions.
1 change: 1 addition & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,7 @@ v<0.9.6>, <11/05/2021> -- Minor bug fix for COPOD.
v<0.9.6>, <12/24/2021> -- Bug fix for MAD (#358).
v<0.9.6>, <12/24/2021> -- Bug fix for COPOD plotting (#337).
v<0.9.6>, <12/24/2021> -- Model persistence doc improvement.
v<0.9.7>, <01/03/2021> -- Add ECOD.



Expand Down
13 changes: 8 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,12 @@ PyOD toolkit consists of three major functional groups:
=================== ================== ====================================================================================================== ===== ========================================
Type Abbr Algorithm Year Ref
=================== ================== ====================================================================================================== ===== ========================================
Probabilistic ECOD Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions 2021 [#Li2021ECOD]_
Probabilistic ABOD Angle-Based Outlier Detection 2008 [#Kriegel2008Angle]_
Probabilistic FastABOD Fast Angle-Based Outlier Detection using approximation 2008 [#Kriegel2008Angle]_
Probabilistic COPOD COPOD: Copula-Based Outlier Detection 2020 [#Li2020COPOD]_
Probabilistic MAD Median Absolute Deviation (MAD) 1993 [#Iglewicz1993How]_
Probabilistic SOS Stochastic Outlier Selection 2012 [#Janssens2012Stochastic]_
Linear Model PCA Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) 2003 [#Shyu2003A]_
Linear Model MCD Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) 1999 [#Hardin2004Outlier]_ [#Rousseeuw1999A]_
Linear Model OCSVM One-Class Support Vector Machines 2001 [#Scholkopf2001Estimating]_
Expand All @@ -322,11 +328,6 @@ Proximity-Based AvgKNN Average kNN (use the average distance t
Proximity-Based MedKNN Median kNN (use the median distance to k nearest neighbors as the outlier score) 2002 [#Angiulli2002Fast]_
Proximity-Based SOD Subspace Outlier Detection 2009 [#Kriegel2009Outlier]_
Proximity-Based ROD Rotation-based Outlier Detection 2020 [#Almardeny2020A]_
Probabilistic ABOD Angle-Based Outlier Detection 2008 [#Kriegel2008Angle]_
Probabilistic COPOD COPOD: Copula-Based Outlier Detection 2020 [#Li2020COPOD]_
Probabilistic FastABOD Fast Angle-Based Outlier Detection using approximation 2008 [#Kriegel2008Angle]_
Probabilistic MAD Median Absolute Deviation (MAD) 1993 [#Iglewicz1993How]_
Probabilistic SOS Stochastic Outlier Selection 2012 [#Janssens2012Stochastic]_
Outlier Ensembles IForest Isolation Forest 2008 [#Liu2008Isolation]_
Outlier Ensembles FB Feature Bagging 2005 [#Lazarevic2005Feature]_
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 [#Zhao2019LSCP]_
Expand Down Expand Up @@ -571,6 +572,8 @@ Reference
.. [#Li2020COPOD] Li, Z., Zhao, Y., Botta, N., Ionescu, C. and Hu, X. COPOD: Copula-Based Outlier Detection. *IEEE International Conference on Data Mining (ICDM)*, 2020.
.. [#Li2021ECOD] Li, Z., Zhao, Y., Hu, X., Botta, N., Ionescu, C. and Chen, H. G. ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions. arXiv preprint arXiv:2201.00382 (2021).
.. [#Liu2008Isolation] Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In *International Conference on Data Mining*\ , pp. 413-422. IEEE.
.. [#Liu2019Generative] Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M. and He, X., 2019. Generative adversarial active learning for unsupervised outlier detection. *IEEE Transactions on Knowledge and Data Engineering*.
Expand Down
16 changes: 8 additions & 8 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,17 +143,22 @@ PyOD toolkit consists of three major functional groups:

**(i) Individual Detection Algorithms** :

1. Linear Models for Outlier Detection:

=================== ================ ====================================================================================================== ===== =================================================== ======================================================
Type Abbr Algorithm Year Class Ref
=================== ================ ====================================================================================================== ===== =================================================== ======================================================
Probabilistic ECOD Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions 2021 :class:`pyod.models.ecod.ECOD` :cite:`a-li2021ecod`
Probabilistic COPOD COPOD: Copula-Based Outlier Detection 2020 :class:`pyod.models.copod.COPOD` :cite:`a-li2020copod`
Probabilistic ABOD Angle-Based Outlier Detection 2008 :class:`pyod.models.abod.ABOD` :cite:`a-kriegel2008angle`
Probabilistic FastABOD Fast Angle-Based Outlier Detection using approximation 2008 :class:`pyod.models.abod.ABOD` :cite:`a-kriegel2008angle`
Probabilistic MAD Median Absolute Deviation (MAD) 1993 :class:`pyod.models.mad.MAD` :cite:`a-iglewicz1993detect`
Probabilistic SOS Stochastic Outlier Selection 2012 :class:`pyod.models.sos.SOS` :cite:`a-janssens2012stochastic`
Linear Model PCA Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) 2003 :class:`pyod.models.pca.PCA` :cite:`a-shyu2003novel`
Linear Model MCD Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) 1999 :class:`pyod.models.mcd.MCD` :cite:`a-rousseeuw1999fast,a-hardin2004outlier`
Linear Model OCSVM One-Class Support Vector Machines 2001 :class:`pyod.models.ocsvm.OCSVM` :cite:`a-scholkopf2001estimating`
Linear Model LMDD Deviation-based Outlier Detection (LMDD) 1996 :class:`pyod.models.lmdd.LMDD` :cite:`a-arning1996linear`
Proximity-Based LOF Local Outlier Factor 2000 :class:`pyod.models.lof.LOF` :cite:`a-breunig2000lof`
Proximity-Based COF Connectivity-Based Outlier Factor 2002 :class:`pyod.models.cof.COF` :cite:`a-tang2002enhancing`
Proximity-Based Incr. COF Memory Efficient Connectivity-Based Outlier Factor (slower but reduce storage complexity) 2002 :class:`pyod.models.cof.COF` :cite:`a-tang2002enhancing`
Proximity-Based CBLOF Clustering-Based Local Outlier Factor 2003 :class:`pyod.models.cblof.CBLOF` :cite:`a-he2003discovering`
Proximity-Based LOCI LOCI: Fast outlier detection using the local correlation integral 2003 :class:`pyod.models.loci.LOCI` :cite:`a-papadimitriou2003loci`
Proximity-Based HBOS Histogram-based Outlier Score 2012 :class:`pyod.models.hbos.HBOS` :cite:`a-goldstein2012histogram`
Expand All @@ -162,13 +167,8 @@ Proximity-Based AvgKNN Average kNN (use the average distance to
Proximity-Based MedKNN Median kNN (use the median distance to k nearest neighbors as the outlier score) 2002 :class:`pyod.models.knn.KNN` :cite:`a-ramaswamy2000efficient,a-angiulli2002fast`
Proximity-Based SOD Subspace Outlier Detection 2009 :class:`pyod.models.sod.SOD` :cite:`a-kriegel2009outlier`
Proximity-Based ROD Rotation-based Outlier Detection 2020 :class:`pyod.models.rod.ROD` :cite:`a-almardeny2020novel`
Probabilistic ABOD Angle-Based Outlier Detection 2008 :class:`pyod.models.abod.ABOD` :cite:`a-kriegel2008angle`
Probabilistic FastABOD Fast Angle-Based Outlier Detection using approximation 2008 :class:`pyod.models.abod.ABOD` :cite:`a-kriegel2008angle`
Probabilistic COPOD COPOD: Copula-Based Outlier Detection 2020 :class:`pyod.models.copod.COPOD` :cite:`a-li2020copod`
Probabilistic MAD Median Absolute Deviation (MAD) 1993 :class:`pyod.models.mad.MAD` :cite:`a-iglewicz1993detect`
Probabilistic SOS Stochastic Outlier Selection 2012 :class:`pyod.models.sos.SOS` :cite:`a-janssens2012stochastic`
Outlier Ensembles IForest Isolation Forest 2008 :class:`pyod.models.iforest.IForest` :cite:`a-liu2008isolation,a-liu2012isolation`
Outlier Ensembles Feature Bagging 2005 :class:`pyod.models.feature_bagging.FeatureBagging` :cite:`a-lazarevic2005feature`
Outlier Ensembles FB Feature Bagging 2005 :class:`pyod.models.feature_bagging.FeatureBagging` :cite:`a-lazarevic2005feature`
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 :class:`pyod.models.lscp.LSCP` :cite:`a-zhao2019lscp`
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection **(Supervised)** 2018 :class:`pyod.models.xgbod.XGBOD` :cite:`a-zhao2018xgbod`
Outlier Ensembles LODA Lightweight On-line Detector of Anomalies 2016 :class:`pyod.models.loda.LODA` :cite:`a-pevny2016loda`
Expand Down
10 changes: 10 additions & 0 deletions docs/pyod.models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,16 @@ pyod.models.deep\_svdd module
:show-inheritance:
:inherited-members:

pyod.models.ecod module
------------------------

.. automodule:: pyod.models.ecod
:members:
:exclude-members:
:undoc-members:
:show-inheritance:
:inherited-members:

pyod.models.feature\_bagging module
-----------------------------------

Expand Down
7 changes: 7 additions & 0 deletions docs/zreferences.bib
Original file line number Diff line number Diff line change
Expand Up @@ -377,4 +377,11 @@ @inproceedings{perini2020quantifying
pages={227--243},
year={2020},
publisher={Springer}
}

@article{Li2021ecod,
title={ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions},
author={Li, Zheng and Zhao, Yue and Hu, Xiyang and Botta, Nicola and Ionescu, Cezar and Chen, H. George},
journal={arXiv preprint arXiv:2201.00382},
year={2021}
}
60 changes: 60 additions & 0 deletions examples/ecod_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# -*- coding: utf-8 -*-
"""Example of using ECOD for outlier detection
"""
# Author: Yue Zhao <[email protected]>
# License: BSD 2 clause

from __future__ import division
from __future__ import print_function

import os
import sys

# temporary solution for relative imports in case pyod is not installed
# if pyod is installed, no need to use the following line
sys.path.append(
os.path.abspath(os.path.join(os.path.dirname("__file__"), '..')))

from pyod.models.ecod import ECOD
from pyod.utils.data import generate_data
from pyod.utils.data import evaluate_print
from pyod.utils.example import visualize

if __name__ == "__main__":
contamination = 0.1 # percentage of outliers
n_train = 200 # number of training points
n_test = 100 # number of testing points

# Generate sample data
X_train, y_train, X_test, y_test = \
generate_data(n_train=n_train,
n_test=n_test,
n_features=2,
contamination=contamination,
random_state=42)

# train ECOD detector
clf_name = 'ECOD'
clf = ECOD()

# you could try parallel version as well.
# clf = ECOD(n_jobs=2)
clf.fit(X_train)

# get the prediction labels and outlier scores of the training data
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_ # raw outlier scores

# get the prediction on the test data
y_test_pred = clf.predict(X_test) # outlier labels (0 or 1)
y_test_scores = clf.decision_function(X_test) # outlier scores

# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, y_train, y_train_scores)
print("\nOn Test Data:")
evaluate_print(clf_name, y_test, y_test_scores)

# visualize the results
visualize(clf_name, X_train, y_train, X_test, y_test, y_train_pred,
y_test_pred, show_figure=True, save_figure=False)
Loading

0 comments on commit 13b0cd5

Please sign in to comment.