Skip to content

Code for the PIDForest algorithm for anomaly detection

License

Notifications You must be signed in to change notification settings

janfrancu/pidforest

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PIDforest

Code for the PIDForest algorithm for anomaly detection.

The PIDForest algorithm is based on the Partial Identification framework for anomaly detection. Partial Identification captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. PIDScore is a geometric anomaly score based on this framework, and it measures the minimum density of data points over all subcubes containing the point. PIDForest is a random forest based algorithm that finds anomalies based on PIDScore.

The accompanying paper shows that PIDForest performs favorably in comparison to several popular anomaly detection methods, across a broad range of benchmarks. PIDForest also provides a succinct explanation for why a point is labelled anomalous, by providing a set of features and ranges for them which are relatively uncommon in the dataset.

The associated data files in .mat format are also attached. Many of these datasets have additional citation requests if they are useful in your research.

The current implementation is in Python, we are working on releasing a much faster C++ based implementation soon.

4th Sep 2020: janfrancu

I have made some necessary changes in order to convert pidforest into an installable python module.

About

Code for the PIDForest algorithm for anomaly detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 83.9%
  • Python 16.1%