Skip to content

Commit

Permalink
Add documentation of distribtion class (#24)
Browse files Browse the repository at this point in the history
* Add documentation of distribtion class

* Update distribution.rst

each sentence new line (semantic line breaks)

* semantic line breaks

* Extend documentation of distribution

* Extend documentation for distribution

---------

Co-authored-by: David Hägele <[email protected]>
  • Loading branch information
marinaevers and hageldave authored Sep 18, 2024
1 parent e73672c commit d0cb18e
Show file tree
Hide file tree
Showing 9 changed files with 167 additions and 39 deletions.
2 changes: 2 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ Changelog

0.0.2 (in preparation)
---

* Reorganization of imports
* Documentation of distribution class

0.0.1
---
Expand Down
3 changes: 3 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

import sphinx_rtd_theme
import sys
import os
sys.path.insert(0, os.path.abspath('..'))

project = 'UADAPy'
copyright = '2024, Ruben Bauer, Marina Evers, David Hägele, Patrick Paetzold'
Expand Down
52 changes: 52 additions & 0 deletions docs/distribution.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
============
Distribution
============

The `Distribution` class serves as a core component for handling probability distributions, both parametric and non-parametric.
It allows you to create a distribution from either a statistical model (such as one from `scipy.stats`) or directly from a dataset (samples).
The class also supports handling multivariate distributions and automatically distinguishes between univariate and multivariate cases.
This class abstracts away the complexity of working with different types of distributions while providing a uniform interface for statistical operations.

Creating a distribution
-----------------------
.. code-block:: python
def __init__(self, model, name="", n_dims=1)
:Parameters:
- **model**: A `scipy.stats` distribution object or an array of samples.
- **name** *(str)*: The name of the distribution (optional; default is inferred from the model).
- **n_dims** *(int)*: Dimensionality of the distribution (optional; default is `1`).

If a set of samples is passed instead of a statistical model, a Kernel Density Estimate (KDE) is used for estimating the probability density function (PDF).
If the distribution is named "Normal", the class assumes the samples are from a normal distribution and fits a multivariate normal model to the data.

Working with distributions
--------------------------
**Distribution Properties**: Provides methods for calculating key statistical properties such as:

- **mean() -> np.ndarray | float**:

Returns the mean of the distribution.

- **cov() -> np.ndarray | float**:

Returns the covariance matrix of the distribution.

- **skew() -> np.ndarray | float**:

Returns the skewness of the distribution.

- **kurt() -> np.ndarray | float**:

Returns the kurtosis of the distribution.

**Sampling and PDF Evaluation**:

- **sample(n: int, random_state: int = None) -> np.ndarray**:

Generates `n` random samples from the distribution.

- **pdf(x: np.ndarray | float) -> np.ndarray | float**:

Evaluates the probability density function (PDF) at the given point `x`.
4 changes: 2 additions & 2 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ Uncertainty-aware multidimensional scaling
`Load data, reduce the dimensionality with UAMDS, visualize the output <https://github.com/UniStuttgart-VISUS/uadapy/blob/main/examples/uamds.ipynb>`_

Uncertainty-aware principal component analysis
------------------------------------------
----------------------------------------------

`Load data, reduce the dimensionality with UAPCA, visualize the output <https://github.com/UniStuttgart-VISUS/uadapy/blob/main/examples/uapca.ipynb>`_

Working with own data
------------------------------------------
---------------------

`Load data, create a distribution, visualize it <https://github.com/UniStuttgart-VISUS/uadapy/blob/main/examples/ownData.ipynb>`_
13 changes: 12 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
==================================================
UADAPy - Uncertainty-aware Data Analysis in Python
==================================================
UADAPy is a Python library to support an easy analysis of uncertain data. Here you find the most important information to get started.
UADAPy is a Python library to support an easy analysis of uncertain data.
Here you find the most important information to get started.

.. toctree::
:maxdepth: 1
Expand All @@ -10,6 +11,16 @@ UADAPy is a Python library to support an easy analysis of uncertain data. Here y
installation.rst
examples.rst

Classes
=======
In the following, we describe the most important data structure and provide detailed explanations on some concepts.
This section is currently work in progress and will be extended over time.

.. toctree::
:maxdepth: 1

distribution.rst

Indices and tables
==================
* :ref:`genindex`
Expand Down
10 changes: 10 additions & 0 deletions docs/uadapy.data.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
uadapy.data package
=======================

uadapy.data.data module
-----------------------------------------

.. automodule:: uadapy.data.data
:members:
:undoc-members:
:show-inheritance:
2 changes: 0 additions & 2 deletions docs/uadapy.dr.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
uadapy.dr package
=================

Submodules
----------

uadapy.dr.uamds module
----------------------
Expand Down
28 changes: 1 addition & 27 deletions docs/uadapy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,33 +9,7 @@ Subpackages

uadapy.dr
uadapy.plotting

Submodules
----------

uadapy.data module
------------------

.. automodule:: uadapy.data
:members:
:undoc-members:
:show-inheritance:

uadapy.distribution module
--------------------------

.. automodule:: uadapy.distribution
:members:
:undoc-members:
:show-inheritance:

uadapy.test\_distrib module
---------------------------

.. automodule:: uadapy.test_distrib
:members:
:undoc-members:
:show-inheritance:
uadapy.data

Module contents
---------------
Expand Down
92 changes: 85 additions & 7 deletions uadapy/distribution.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,34 @@


class Distribution:
"""
The Distribution class provides a consistent interface to a variety of distributions.
Attributes
----------
model : str
The underlying concrete distribution model, a `scipy.stats` distribution object or an array of samples
name : str
Name of the distribution type, e.g. 'Normal'
n_dims : int
Dimensionality of the distribution
"""

def __init__(self, model, name="", n_dims=1):
"""
Creates a distribution, if samples are passed as the first parameter,
no assumptions about the distribution are made. For the pdf and the sampling,
a KDE is used. If the name is "Normal", the samples
are treated as samples of a normal distribution.
:param model: A scipy.stats distribution or samples
:param name: The name of the distribution
:param n_dims: The dimensionality of the distribution
Parameters
----------
model:
A scipy.stats distribution or samples
name: str, optional
The name of the distribution
n_dims: int, optional
The dimensionality of the distribution (default is 1)
"""
if name:
self.name = name
Expand All @@ -35,15 +53,43 @@ def __init__(self, model, name="", n_dims=1):
if isinstance(self.model, np.ndarray):
self.kde = stats.gaussian_kde(self.model.T)

def sample(self, n: int, random_state: int = None) -> np.ndarray:
def sample(self, n: int, seed: int = None) -> np.ndarray:
"""
Creates samples from the distribution.
Parameters
----------
n : int
Number of samples.
seed : int, optional
Seed for the random number generator for reproducibility, default is None.
Returns
-------
np.ndarray
Samples of the distribution.
"""
if isinstance(self.model, np.ndarray):
return self.kde.resample(n, random_state).T
return self.kde.resample(n, seed).T
if hasattr(self.model, 'rvs') and callable(self.model.rvs):
return self.model.rvs(size=n, random_state=random_state)
return self.model.rvs(size=n, random_state=seed)
if hasattr(self.model, 'resample') and callable(self.model.resample):
return self.model.resample(size=n, seed=random_state)
return self.model.resample(size=n, seed=seed)

def pdf(self, x: np.ndarray | float) -> np.ndarray | float:
"""
Computes the probability density function.
Parameters
----------
x : np.ndarray or float
The position where the pdf should be evaluated.
Returns
-------
np.ndarray or float
Samples of the distribution.
"""
if isinstance(self.model, np.ndarray):
return self.kde.pdf(x.T)
if not hasattr(self.model, 'pdf'):
Expand All @@ -52,6 +98,14 @@ def pdf(self, x: np.ndarray | float) -> np.ndarray | float:
return self.model.pdf(x)

def mean(self) -> np.ndarray | float:
"""
Expected value of the distribution.
Returns
-------
np.ndarray or float
Expected value of the distribution.
"""
if isinstance(self.model, np.ndarray):
return np.mean(self.model, axis=0)
if hasattr(self.model, 'mean'):
Expand All @@ -66,6 +120,14 @@ def mean(self) -> np.ndarray | float:
raise AttributeError(f"Mean not implemented yet! {self.model.__class__.__name__}")

def cov(self) -> np.ndarray | float:
"""
Covariance of the distribution.
Returns
-------
np.ndarray or float
Covariance of the distribution.
"""
if isinstance(self.model, np.ndarray):
return np.cov(self.model.T)
if hasattr(self.model, 'cov'):
Expand All @@ -86,6 +148,14 @@ def cov(self) -> np.ndarray | float:


def skew(self) -> np.ndarray | float:
"""
Skewness of the distribution.
Returns
-------
np.ndarray or float
Skewness of the distribution.
"""
if isinstance(self.model, np.ndarray):
return stats.skew(self.model)
if hasattr(self.model, 'stats') and callable(self.model.stats):
Expand All @@ -94,6 +164,14 @@ def skew(self) -> np.ndarray | float:
return 0

def kurt(self) -> np.ndarray | float:
"""
Kurtosis of the distribution.
Returns
-------
np.ndarray or float
Kurtosis of the distribution.
"""
if isinstance(self.model, np.ndarray):
return stats.kurtosis(self.model)
if hasattr(self.model, 'stats') and callable(self.model.stats):
Expand Down

0 comments on commit d0cb18e

Please sign in to comment.