Skip to content

Commit

Permalink
version 0.6.0 (#7)
Browse files Browse the repository at this point in the history
* version 0.6.0

* `datasets` module with toy datasets for causal analysis
* `contrib` module for new state-of-the-art outside contributions
* New implementation for MarginalOutcomeEstimator
  (formerly UncorrectedEstimator) using WeightEstimator API
* Additional Jupyter Notebook examples
* Additional bug fix and documentation
  • Loading branch information
ehudkr authored Feb 13, 2020
1 parent e5ec1af commit de64cf6
Show file tree
Hide file tree
Showing 70 changed files with 59,294 additions and 397 deletions.
3 changes: 3 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
language: python
python:
- "3.6"
- "3.7"
- "3.8"
cache: pip
before_script:
- curl -L https://codeclimate.com/downloads/test-reporter/test-reporter-latest-linux-amd64 > ./cc-test-reporter
Expand All @@ -12,6 +14,7 @@ install:
script:
- pip install -e . # test that install is running properly
- pip freeze
- pytest causallib/contrib/tests
- pytest --cov-report= --cov=causallib causallib/tests
after_success:
- coverage xml
Expand Down
93 changes: 78 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,36 +2,99 @@
[![Test Coverage](https://api.codeclimate.com/v1/badges/db2562e44c4a9f7280dc/test_coverage)](https://codeclimate.com/github/IBM/causallib/test_coverage)
[![PyPI version](https://badge.fury.io/py/causallib.svg)](https://badge.fury.io/py/causallib)
[![Documentation Status](https://readthedocs.org/projects/causallib/badge/?version=latest)](https://causallib.readthedocs.io/en/latest/)
# IBM Causal Inference Library
A Python package for computational inference of causal effect.
# Causal Inference 360
A Python package for inferring causal effects from observational data.

## Description
Causal inference analysis allows estimating of the effect of intervention
on some outcome from observational data.
It deals with the selection bias that is inherent to such data.
Causal inference analysis enables estimating the causal effect of
an intervention on some outcome from real-world non-experimental observational data.

This python package allows creating modular causal inference models
that internally utilize machine learning models of choice,
and can estimate either individual or average outcome given an intervention.
The package also provides the means to evaluate the performance of the
machine learning models and their predictions.
This package provides a suite of causal methods,
under a unified scikit-learn-inspired API.
It implements meta-algorithms that allow plugging in arbitrarily complex machine learning models.
This modular approach supports highly-flexible causal modelling.
The fit-and-predict-like API makes it possible to train on one set of examples
and estimate an effect on the other (out-of-bag),
which allows for a more "honest"<sup>1</sup> effect estimation.

The package also includes an evaluation suite.
Since most causal-models utilize machine learning models internally,
we can diagnose poor-performing models by re-interpreting known ML evaluations from a causal perspective.
See [arXiv:1906.00442](https://arxiv.org/abs/1906.00442) for more details on how.


-------------
<sup>1</sup> Borrowing [Wager & Athey](https://arxiv.org/abs/1510.04342) terminology of avoiding overfit.

The machine learning models must comply with scikit-learn's api
and contain `fit()` and `predict()` functions.
Categorical models must also implement `predict_proba()`.

## Installation
```bash
pip install causallib
```

## Usage
In general, the package is imported using the name `causallib`.
For example, use
In general, the package is imported using the name `causallib`.
Every causal model requires an internal machine-learning model.
`causallib` supports any model that has a sklearn-like fit-predict API
(note some models might require a `predict_proba` implementation).

For example:
```Python
from sklearn.linear_model import LogisticRegression
from causallib.estimation import IPW
from causallib.datasets import load_nhefs

data = load_nhefs()
ipw = IPW(LogisticRegression())
ipw.fit(data.X, data.a)
potential_outcomes = ipw.estimate_population_outcome(data.X, data.a, data.y)
effect = ipw.estimate_effect(potential_outcomes[1], potential_outcomes[0])
```
Comprehensive Jupyter Notebooks examples can be found in the [examples directory](examples).

### Approach to causal-inference
Some key points on how we address causal-inference estimation

##### 1. Emphasis on potential outcome prediction
Causal effect may be the desired outcome.
However, every effect is defined by two potential (counterfactual) outcomes.
We adopt this two-step approach by separating the effect-estimating step
from the potential-outcome-prediction step.
A beneficial consequence to this approach is that it better supports
multi-treatment problems where "effect" is not well-defined.

##### 2. Stratified average treatment effect
The causal inference literature devotes special attention to the population
on which the effect is estimated on.
For example, ATE (average treatment effect on the entire sample),
ATT (average treatment effect on the treated), etc.
By allowing out-of-bag estimation, we leave this specification to the user.
For example, ATE is achieved by `model.estimate_population_outcome(X, a)`
and ATT is done by stratifying on the treated: `model.estimate_population_outcome(X.loc[a==1], a.loc[a==1])`

##### 3. Families of causal inference models
We distinguish between two types of models:
* *Weight models*: weight the data to balance between the treatment and control groups,
and then estimates the potential outcome by using a weighted average of the observed outcome.
Inverse Probability of Treatment Weighting (IPW or IPTW) is the most known example of such models.
* *Direct outcome models*: uses the covariates (features) and treatment assignment to build a
model that predicts the outcome directly. The model can then be used to predict the outcome
under any assignment of treatment values, specifically the potential-outcome under assignment of
all controls or all treated.
These models are usually known as *Standardization* models, and it should be noted that, currently,
they are the only ones able to generate *individual effect estimation* (otherwise known as CATE).

##### 4. Confounders and DAGs
One of the most important steps in causal inference analysis is to have
proper selection on both dimensions of the data to avoid introducing bias:
* On rows: thoughtfully choosing the right inclusion\exclusion criteria
for individuals in the data.
* On columns: thoughtfully choosing what covariates (features) act as confounders
and should be included in the analysis.

This is a place where domain expert knowledge is required and cannot be fully and truly automated
by algorithms.
This package assumes that the data provided to the model fit the criteria.
However, filtering can be applied in real-time using a scikit-learn pipeline estimator
that chains preprocessing steps (that can filter rows and select columns) with a causal model at the end.

5 changes: 4 additions & 1 deletion causallib/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,10 @@ These can be used within a pipeline framework together with the models.
### `datasets`
Several datasets are provided within the package in the `datasets` module:
* NHEFS study data on the effect of smoking cessation on weight gain.
* simulation module allows creating simulated data based on a causal graph
Adapted from [Hernán and Robins' Causal Inference Book](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/)
* A handful of simulation sets from the [2016 Atlantic Causal Inference
Conference (ACIC) data challenge](https://jenniferhill7.wixsite.com/acic-2016/competition).
* Simulation module allows creating simulated data based on a causal graph
depicting the connection between covariates, treatment assignment and outcomes.

### Additional folders
Expand Down
1 change: 1 addition & 0 deletions causallib/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = "0.6.0"
29 changes: 29 additions & 0 deletions causallib/contrib/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Module `causallib.contrib`
This module currently includes additional causal methods contributed to the package
by causal inference researchers other than `causallib`'s core developers.

The causal models in this module can be slightly more novel then in the ones in `estimation` module.
However, they should largely adhere to `causallib` API
(e.g., `IndividualOutcomeEstimator` or `WeightEstimator`).
Since code here is more experimental,
models might also require additional (and less trivial) package dependencies,
or have less test coverage.
Well-integrated models could be transferred into the main `estimation` module in the future.

## Contributed Methods
Currently contributed methods are:

1. Adversarial Balancing: implementing the algorithm described in
[Adversarial Balancing for Causal Inference](https://arxiv.org/abs/1810.07406).
```python
from causallib.contrib.adversarial_balancing import AdversarialBalancing

## Dependencies
Each model might have slightly different requirements.
Refer to the documentation of each model for the additional packages it requires.

Requirements for `contrib` models will be concentrated in `contrib/requirements.txt` and should be
automatically installed using the extra-requirements `contrib` flag:
```shell script
pip install causallib[contrib]
```
Empty file added causallib/contrib/__init__.py
Empty file.
1 change: 1 addition & 0 deletions causallib/contrib/adversarial_balancing/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .adversarial_balancing import AdversarialBalancing
Loading

0 comments on commit de64cf6

Please sign in to comment.