From a9b055079bbbc16a4fc7ab5fa0af287e5b8de19f Mon Sep 17 00:00:00 2001 From: Yue Zhao Date: Mon, 4 Jun 2018 11:22:39 -0400 Subject: [PATCH] Improve documentation --- README.md | 69 +++++++++++++------------------- docs/index.rst | 19 +++------ docs/install.rst | 9 ++--- docs/pyod.models.rst | 10 ++++- examples/feat_bagging_example.py | 2 +- 5 files changed, 47 insertions(+), 62 deletions(-) diff --git a/README.md b/README.md index e5018c867..5bbb7a569 100644 --- a/README.md +++ b/README.md @@ -5,17 +5,19 @@ -------------------------- -PyOD is a comprehensive **Python toolkit** to **identify outlying objects** in data with both unsupervised and supervised approaches. -This exciting yet challenging field is commonly referred as ***[Outlier Detection](https://en.wikipedia.org/wiki/Anomaly_detection)*** or ***[Anomaly Detection](https://en.wikipedia.org/wiki/Anomaly_detection)*** . +PyOD is a comprehensive **Python toolkit** to **identify outlying objects** in +multivariate data with both unsupervised and supervised approaches. +This exciting yet challenging field is commonly referred as +***[Outlier Detection](https://en.wikipedia.org/wiki/Anomaly_detection)*** +or ***[Anomaly Detection](https://en.wikipedia.org/wiki/Anomaly_detection)***. +The toolkit has been successfully used in various academic researches [4, 8] and commercial products. Unlike existing libraries, PyOD provides: -- **Unified and consistent APIs** across various anomaly detection algorithms for easy use. -- **Compatibility with Python 2 and 3**. All implemented algorithms are **scikit-learn compatible** as well. -- Additional functionalities, e.g., **Detector Combination Frameworks** for ensemble learning. -- **Detailed API Reference, Examples and Tests** for better readability and reliability. +- **Unified and consistent APIs** across various anomaly detection algorithms. +- **Compatibility with both Python 2 and 3**. All implemented algorithms are also **scikit-learn compatible**. +- **Advanced functions**, e.g., **Outlier Ensemble Frameworks** to combine multiple detectors. +- **Detailed API Reference, Examples and Tests** for better reliability. -**The toolbox has been successfully used in various academic researches [4, 8] and commercial products. It is currently under active development**. However, -the primary purpose of the toolkit is quick exploration. Using it as the final output should be cautious; fine-tunning may be needed to generate meaningful results. The authors can be reached out at yuezhao@cs.toronto.edu; comments, questions, pull requests and issues are welcome. **Enjoy catching outliers!** **Table of Contents**: @@ -51,16 +53,16 @@ PyOD toolkit consists of three major groups of functionalities: (i) **outlier de 1. **Local Outlier Factor, LOF** [1] 2. **Isolation Forest, iForest** [2] 3. **One-Class Support Vector Machines** [3] - 4. **k Nearest Neighbors Detector** (use the distance to the kth nearst neighbor as the outlier score) - 5. **Average kNN** Outlier Detection (use the average distance to k nearst neighbors as the outlier score) - 6. **Median kNN** Outlier Detection (use the median distance to k nearst neighbors as the outlier score) + 4. **k Nearest Neighbors Detector (kNN)** (use the distance to the kth nearest neighbor as the outlier score) + 5. **Average kNN** Outlier Detection (use the average distance to k nearest neighbors as the outlier score) + 6. **Median kNN** Outlier Detection (use the median distance to k nearest neighbors as the outlier score) 7. **Histogram-based Outlier Score, HBOS** [5] 8. **Angle-Based Outlier Detection, ABOD** [7] 9. **Fast Angle-Based Outlier Detection, FastABOD** [7] 10. More to add... - Outlier Ensemble Framework (Outlier Score Combination Frameworks) - 1. **Feature bagging** + 1. **Feature bagging** [9] 2. **Average** & **Weighted Average** [6] 3. **Maximization** [6] 4. **Average of Maximum (AOM)** [6] @@ -96,11 +98,10 @@ Python Version: Library Dependency: ````cmd -matplotlib # needed for running examples -nose # needed for running tests +matplotlib # optional. Only needed for running examples +nose # optional. Only needed for running tests numpy>=1.13 -pathlib2 ; python_version < '3' # needed if python 2.7 -pytest # needed for running tests +pytest # optional. Only needed for running tests scipy>=0.19.1 scikit_learn>=0.19.1 ```` @@ -124,31 +125,14 @@ Full package structure can be found below: ------------ ### Quick Start for Outlier Detection -See **examples folder** for more demos. "examples/knn_example.py" demonstrates -the basic APIs of PyOD using kNN detector. **It is noted the APIs for other detectors are similar**. +See **examples directory** for more demos. ["examples/knn_example.py"](https://github.com/yzhao062/Pyod/blob/master/examples/knn_example.py) +demonstrates the basic APIs of PyOD using kNN detector. **It is noted the APIs for other detectors are similar**. -0. Import models - ````python +1. Initialize a kNN detector, fit the model, and make the prediction. + ```python + from pyod.models.knn import KNN # kNN detector - from pyod.utils.data import generate_data - from pyod.utils.data import evaluate_print - from pyod.utils.data import visualize - ```` - -1. Generate sample data first; normal data is generated by a 2-d Gaussian -distribution, and outliers are generated by a 2-d uniform distribution. - ````python - contamination = 0.1 # percentage of outliers - n_train = 200 # number of training points - n_test = 100 # number of testing points - - X_train, y_train, X_test, y_test = generate_data( - n_train=n_train, n_test=n_test, contamination=contamination) - ```` - -2. Initialize a kNN detector, fit the model, and make the prediction. - ```python # train kNN detector clf_name = 'KNN' clf = KNN() @@ -162,7 +146,7 @@ distribution, and outliers are generated by a 2-d uniform distribution. y_test_pred = clf.predict(X_test) # outlier labels (0 or 1) y_test_scores = clf.decision_function(X_test) # outlier scores ``` -3. Evaluate the prediction by ROC and Precision@rank *n* (p@n): +2. Evaluate the prediction by ROC and Precision@rank *n* (p@n): ```python # evaluate and print the results print("\nOn Training Data:") @@ -170,7 +154,7 @@ distribution, and outliers are generated by a 2-d uniform distribution. print("\nOn Test Data:") evaluate_print(clf_name, y_test, y_test_scores) ``` - 4. See a sample output & visualization + 3. See a sample output & visualization ````python On Training Data: KNN ROC:1.0, precision @ rank n:1.0 @@ -183,7 +167,7 @@ distribution, and outliers are generated by a 2-d uniform distribution. y_test_pred, show_figure=True, save_figure=False) ```` -To check the result of the classification visually ([knn_figure](https://github.com/yzhao062/Pyod/blob/master/examples/KNN.png)): +Visualization ([knn_figure](https://github.com/yzhao062/Pyod/blob/master/examples/KNN.png)): ![kNN example figure](https://github.com/yzhao062/Pyod/blob/master/examples/KNN.png) --- @@ -241,7 +225,7 @@ The walkthrough of the code example is provided: comb_by_aom = aom(test_scores_norm, 5) # 5 groups comb_by_moa = moa(test_scores_norm, 5)) # 5 groups ``` -4. Finally, all four combination methods are evaluated with 20 iterations: +4. Finally, all four combination methods are evaluated with 10 iterations: ````bash Combining 20 kNN detectors ite 1 comb by average, ROC: 0.9014 precision@n_train: 0.4531 @@ -275,3 +259,4 @@ The walkthrough of the code example is provided: [8] Y. Zhao and M.K. Hryniewicki, "XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning," *IEEE International Joint Conference on Neural Networks*, 2018. +[9] Lazarevic, A. and Kumar, V., 2005, August. Feature bagging for outlier detection. In *KDD '05*. 2005. diff --git a/docs/index.rst b/docs/index.rst index 34902f3b3..d64c8e95b 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -25,19 +25,13 @@ PyOD Documentation to **identify outlying objects** in data with both unsupervised and supervised approaches. This exciting yet challenging field is commonly referred as `Outlier Detection `_ or `Anomaly Detection `_. +The toolkit has been successfully used in various academic researches [4, 8] and commercial products. Unlike existing libraries, PyOD provides: -- **Unified and consistent APIs** across various anomaly detection algorithms for easy use. -- **Compatibility with Python 2 and 3**. All implemented algorithms are **scikit-learn compatible** as well. -- Additional functionalities, e.g., **Detector Combination Frameworks** for ensemble learning. -- **Detailed API Reference, Examples and Tests** for better readability and reliability. - -**The toolbox has been successfully used in various academic researches [4, 8] and commercial products. -It is currently under active development**. However, -the primary purpose of the toolkit is quick exploration. Using it as the final output should be cautious; -fine-tunning may be needed to generate meaningful results. -The authors can be reached out at yuezhao@cs.toronto.edu; comments, questions, pull requests and issues are welcome. -**Enjoy catching outliers!** +- **Unified and consistent APIs** across various anomaly detection algorithms. +- **Compatibility with both Python 2 and 3**. All implemented algorithms are also **scikit-learn compatible**. +- **Advanced functions**, e.g., **Outlier Ensemble Frameworks** to combine multiple detectors. +- **Detailed API Reference, Examples and Tests** for better reliability. **Key Links**: @@ -62,12 +56,11 @@ The toolkit consists of three major groups of functionalities: * Fast Angle-Based Outlier Detection, FastABOD [7]: :class:`pyod.models.abod.ABOD` 2. **Outlier ensemble frameworks**, see :mod:`pyod.models.combination`. - * Feature bagging + * Feature bagging: :class:`pyod.models.feature_bagging.FeatureBagging` * Average and Weighted Average: :func:`pyod.models.combination.average` * Maximization: :func:`pyod.models.combination.maximization` * Average of Maximum (AOM): [6] :func:`pyod.models.combination.aom` * Maximum of Average (MOA): [6] :func:`pyod.models.combination.moa` - * Threshold Sum (Thresh) [6] 3. **Outlier detection utility functions**, see :mod:`pyod.utils`. diff --git a/docs/install.rst b/docs/install.rst index da8407341..5ed7768e0 100644 --- a/docs/install.rst +++ b/docs/install.rst @@ -32,10 +32,9 @@ Library Dependency: .. code-block:: bash - - matplotlib # needed for running examples - - nose # needed for running tests + - matplotlib # optional. Only needed for running examples + - nose # optional. Only needed for running tests - numpy>=1.13 - - pathlib2 ; python_version < '3' # needed if python 2.7 - - pytest # needed for running tests + - pytest # optional. Only needed for running tests - scipy>=0.19.1 - - scikit_learn>=0.19.1 \ No newline at end of file + - scikit_learn>=0.19.1 diff --git a/docs/pyod.models.rst b/docs/pyod.models.rst index d27f2e029..da28be82e 100644 --- a/docs/pyod.models.rst +++ b/docs/pyod.models.rst @@ -40,6 +40,15 @@ pyod.models.hbos module :show-inheritance: :inherited-members: +pyod.models.feature\_bagging module +----------------------- + +.. automodule:: pyod.models.feature_bagging + :members: + :undoc-members: + :show-inheritance: + :inherited-members: + pyod.models.iforest module -------------------------- @@ -76,7 +85,6 @@ pyod.models.ocsvm module :show-inheritance: :inherited-members: - Module contents --------------- diff --git a/examples/feat_bagging_example.py b/examples/feat_bagging_example.py index e80c86f77..ca9e0db45 100644 --- a/examples/feat_bagging_example.py +++ b/examples/feat_bagging_example.py @@ -22,7 +22,7 @@ from matplotlib.lines import Line2D from sklearn.metrics import roc_auc_score -from pyod.models.feat_bagging import FeatureBagging +from pyod.models.feature_bagging import FeatureBagging from pyod.models.knn import KNN from pyod.models.lof import LOF from pyod.models.iforest import IForest