Skip to content

Commit

Permalink
Merge pull request #530 from yzhao062/development
Browse files Browse the repository at this point in the history
v1.1.1
  • Loading branch information
yzhao062 authored Oct 25, 2023
2 parents 1e15311 + f0bfce8 commit b95b82a
Show file tree
Hide file tree
Showing 15 changed files with 703 additions and 71 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/testing-cron.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements_ci.txt
pip install -r docs/requirements.txt
pip install pytest
pip install coverage
pip install coveralls
Expand Down
22 changes: 22 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.11"

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py

# We recommend specifying your dependencies to enable reproducible builds:
# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt
37 changes: 0 additions & 37 deletions .travis.yml

This file was deleted.

3 changes: 2 additions & 1 deletion CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -178,4 +178,5 @@ v<1.0.8>, <03/08/2023> -- Add QMCD detector (#452).
v<1.0.8>, <03/08/2023> -- Optimized ECDF and drop Statsmodels dependency (#467).
v<1.0.9>, <03/19/2023> -- Hot fix for errors in ECOD and COPOD due to the issue of scipy.
v<1.1.0>, <06/19/2023> -- Further integration of PyThresh.
v<1.1.1>, <07/03/2023> -- Bump up sklearn requirement and some hot fixes.
v<1.1.1>, <07/03/2023> -- Bump up sklearn requirement and some hot fixes.
v<1.1.1>, <10/24/2023> -- Add deep isolation forest (#506)
10 changes: 7 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Python Outlier Detection (PyOD)

-----

**News**: We just released a 45-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_.
**News**: We have a 45-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.

**For time-series outlier detection**, please use `TODS <https://github.com/datamllab/tods>`_.
Expand All @@ -70,7 +70,7 @@ multivariate data. This exciting yet challenging field is commonly referred as
or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.

PyOD includes more than 40 detection algorithms, from classical LOF (SIGMOD 2000) to
the latest ECOD (TKDE 2022). Since 2017, PyOD has been successfully used in numerous academic researches and
the latest ECOD and DIF (TKDE 2022 and 2023). Since 2017, PyOD has been successfully used in numerous academic researches and
commercial products with more than `10 million downloads <https://pepy.tech/project/pyod>`_.
It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including
`Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_,
Expand Down Expand Up @@ -199,9 +199,10 @@ Alternatively, you could clone and run setup.py file:
* numpy>=1.19
* numba>=0.51
* scipy>=1.5.1
* scikit_learn>=0.20.0
* scikit_learn>=0.22.0
* six


**Optional Dependencies (see details below)**\ :

* combo (optional, required for models/combination.py and FeatureBagging)
Expand Down Expand Up @@ -392,6 +393,7 @@ Proximity-Based SOD Subspace Outlier Detection
Proximity-Based ROD Rotation-based Outlier Detection 2020 [#Almardeny2020A]_
Outlier Ensembles IForest Isolation Forest 2008 [#Liu2008Isolation]_
Outlier Ensembles INNE Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles 2018 [#Bandaragoda2018Isolation]_
Outlier Ensembles DIF Deep Isolation Forest for Anomaly Detection 2023 [#Xu2023Deep]_
Outlier Ensembles FB Feature Bagging 2005 [#Lazarevic2005Feature]_
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 [#Zhao2019LSCP]_
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection **(Supervised)** 2018 [#Zhao2018XGBOD]_
Expand Down Expand Up @@ -684,6 +686,8 @@ Reference
.. [#Wang2020adVAE] Wang, X., Du, Y., Lin, S., Cui, P., Shen, Y. and Yang, Y., 2019. adVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection. *Knowledge-Based Systems*.
.. [#Xu2023Deep] Xu, H., Pang, G., Wang, Y., Wang, Y., 2023. Deep isolation forest for anomaly detection. *IEEE Transactions on Knowledge and Data Engineering*.
.. [#You2017Provable] You, C., Robinson, D.P. and Vidal, R., 2017. Provable self-representation based outlier detection in a union of subspaces. In Proceedings of the IEEE conference on computer vision and pattern recognition.
.. [#Zenati2018Adversarially] Zenati, H., Romain, M., Foo, C.S., Lecouat, B. and Chandrasekhar, V., 2018, November. Adversarially learned anomaly detection. In 2018 IEEE International conference on data mining (ICDM) (pp. 727-736). IEEE.
Expand Down
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ multivariate data. This exciting yet challenging field is commonly referred as
or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.

PyOD includes more than 40 detection algorithms, from classical LOF (SIGMOD 2000) to
the latest ECOD (TKDE 2022). Since 2017, PyOD :cite:`a-zhao2019pyod` has been successfully used in numerous
the latest ECOD and DIF (TKDE 2022 and 2023). Since 2017, PyOD :cite:`a-zhao2019pyod` has been successfully used in numerous
academic researches and commercial products with more than `10 million downloads <https://pepy.tech/project/pyod>`_.
It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including
`Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_,
Expand Down Expand Up @@ -209,6 +209,7 @@ Proximity-Based SOD Subspace Outlier Detection
Proximity-Based ROD Rotation-based Outlier Detection 2020 :class:`pyod.models.rod.ROD` :cite:`a-almardeny2020novel`
Outlier Ensembles IForest Isolation Forest 2008 :class:`pyod.models.iforest.IForest` :cite:`a-liu2008isolation,a-liu2012isolation`
Outlier Ensembles INNE Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles 2018 :class:`pyod.models.inne.INNE` :cite:`a-bandaragoda2018isolation`
Outlier Ensembles DIF Deep Isolation Forest for Anomaly Detection 2023 :class:`pyod.models.dif.DIF` :cite:`a-Xu2023Deep`
Outlier Ensembles FB Feature Bagging 2005 :class:`pyod.models.feature_bagging.FeatureBagging` :cite:`a-lazarevic2005feature`
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 :class:`pyod.models.lscp.LSCP` :cite:`a-zhao2019lscp`
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection **(Supervised)** 2018 :class:`pyod.models.xgbod.XGBOD` :cite:`a-zhao2018xgbod`
Expand Down
11 changes: 11 additions & 0 deletions docs/pyod.models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,17 @@ pyod.models.deep\_svdd module
:show-inheritance:
:inherited-members:

pyod.models.dif module
-----------------------------

.. automodule:: pyod.models.dif
:members:
:exclude-members:
:undoc-members:
:show-inheritance:
:inherited-members:


pyod.models.ecod module
------------------------

Expand Down
5 changes: 3 additions & 2 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,18 @@ keras
matplotlib
nose
numpy>=1.19
numba==0.53 # need to lift this later see github for issue
numba>=0.51
pyclustering
pytest
pythresh>=0.3.1
ruptures
scipy>=1.5.1
scikit_learn>=0.20.0
scikit_learn>=0.22.0
scikit-lego
six
sphinx-rtd-theme
sphinxcontrib-bibtex
statsmodels
suod
tensorflow
torch
Expand Down
11 changes: 11 additions & 0 deletions docs/zreferences.bib
Original file line number Diff line number Diff line change
Expand Up @@ -489,4 +489,15 @@ @article{fang2001wrap
pages={608--624},
year={2001},
publisher={Elsevier}
}

@article{xu2023dif,
author={Xu, Hongzuo and Pang, Guansong and Wang, Yijie and Wang, Yongjun},
journal={IEEE Transactions on Knowledge and Data Engineering},
title={Deep Isolation Forest for Anomaly Detection},
year={2023},
volume={},
number={},
pages={1-14},
doi={10.1109/TKDE.2023.3270293}
}
53 changes: 53 additions & 0 deletions examples/dif_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# -*- coding: utf-8 -*-
"""Example of using Deep Isolation Forest for
outlier detection"""
# Author: Hongzuo Xu <[email protected]>
# License: BSD 2 clause

from __future__ import division
from __future__ import print_function

import os
import sys

# temporary solution for relative imports in case pyod is not installed
# if pyod is installed, no need to use the following line
sys.path.append(
os.path.abspath(os.path.join(os.path.dirname("__file__"), '..')))

from pyod.models.dif import DIF
from pyod.utils.data import generate_data
from pyod.utils.data import evaluate_print

if __name__ == "__main__":
contamination = 0.1 # percentage of outliers
n_train = 1000 # number of training points
n_test = 200 # number of testing points
n_features = 30 # number of features

# Generate sample data
X_train, X_test, y_train, y_test = \
generate_data(n_train=n_train,
n_test=n_test,
n_features=n_features,
contamination=contamination,
random_state=42)

# train deep isolation forest detector
clf_name = 'DIF'
clf = DIF()
clf.fit(X_train)

# get the prediction labels and outlier scores of the training data
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_ # raw outlier scores

# get the prediction on the test data
y_test_pred = clf.predict(X_test) # outlier labels (0 or 1)
y_test_scores = clf.decision_function(X_test) # outlier scores

# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, y_train, y_train_scores)
print("\nOn Test Data:")
evaluate_print(clf_name, y_test, y_test_scores)
Loading

0 comments on commit b95b82a

Please sign in to comment.