version 0.6.0 (#7)

* version 0.6.0 * `datasets` module with toy datasets for causal analysis * `contrib` module for new state-of-the-art outside contributions * New implementation for MarginalOutcomeEstimator (formerly UncorrectedEstimator) using WeightEstimator API * Additional Jupyter Notebook examples * Additional bug fix and documentation
BiomedSciAI · Feb 13, 2020 · de64cf6 · de64cf6
1 parent e5ec1af
commit de64cf6
Show file tree

Hide file tree

Showing 70 changed files with 59,294 additions and 397 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -1,6 +1,8 @@
 language: python
 python:
   - "3.6"
+  - "3.7"
+  - "3.8"
 cache: pip
 before_script:
   - curl -L https://codeclimate.com/downloads/test-reporter/test-reporter-latest-linux-amd64 > ./cc-test-reporter
@@ -12,6 +14,7 @@ install:
 script:
   - pip install -e .  # test that install is running properly
   - pip freeze
+  - pytest causallib/contrib/tests
   - pytest --cov-report= --cov=causallib causallib/tests
 after_success:
   - coverage xml

diff --git a/README.md b/README.md
@@ -2,36 +2,99 @@
 [![Test Coverage](https://api.codeclimate.com/v1/badges/db2562e44c4a9f7280dc/test_coverage)](https://codeclimate.com/github/IBM/causallib/test_coverage)
 [![PyPI version](https://badge.fury.io/py/causallib.svg)](https://badge.fury.io/py/causallib)
 [![Documentation Status](https://readthedocs.org/projects/causallib/badge/?version=latest)](https://causallib.readthedocs.io/en/latest/)
-# IBM Causal Inference Library
-A Python package for computational inference of causal effect.
+# Causal Inference 360
+A Python package for inferring causal effects from observational data.
 
 ## Description
-Causal inference analysis allows estimating of the effect of intervention
-on some outcome from observational data.
-It deals with the selection bias that is inherent to such data.  
+Causal inference analysis enables estimating the causal effect of 
+an intervention on some outcome from real-world non-experimental observational data.  
 
-This python package allows creating modular causal inference models
-that internally utilize machine learning models of choice,
-and can estimate either individual or average outcome given an intervention.
-The package also provides the means to evaluate the performance of the 
-machine learning models and their predictions.
+This package provides a suite of causal methods, 
+under a unified scikit-learn-inspired API.  
+It implements meta-algorithms that allow plugging in arbitrarily complex machine learning models. 
+This modular approach supports highly-flexible causal modelling.    
+The fit-and-predict-like API makes it possible to train on one set of examples 
+and estimate an effect on the other (out-of-bag),
+which allows for a more "honest"<sup>1</sup> effect estimation.
+
+The package also includes an evaluation suite. 
+Since most causal-models utilize machine learning models internally, 
+we can diagnose poor-performing models by re-interpreting known ML evaluations from  a causal perspective.
+See [arXiv:1906.00442](https://arxiv.org/abs/1906.00442) for more details on how.
+
+
+-------------
+<sup>1</sup> Borrowing [Wager & Athey](https://arxiv.org/abs/1510.04342) terminology of avoiding overfit.  
 
-The machine learning models must comply with scikit-learn's api 
-and contain `fit()` and `predict()` functions. 
-Categorical models must also implement `predict_proba()`. 
 
 ## Installation
 ```bash
 pip install causallib
 ```
 
 ## Usage
-In general, the package is imported using the name `causallib`. 
-For example, use 
+In general, the package is imported using the name `causallib`.  
+Every causal model requires an internal machine-learning model. 
+`causallib` supports any model that has a sklearn-like fit-predict API
+(note some models might require a `predict_proba` implementation).  
+
+For example:
 ```Python
 from sklearn.linear_model import LogisticRegression
 from causallib.estimation import IPW 
+from causallib.datasets import load_nhefs
+
+data = load_nhefs()
 ipw = IPW(LogisticRegression())
+ipw.fit(data.X, data.a)
+potential_outcomes = ipw.estimate_population_outcome(data.X, data.a, data.y)
+effect = ipw.estimate_effect(potential_outcomes[1], potential_outcomes[0])
 ```
 Comprehensive Jupyter Notebooks examples can be found in the [examples directory](examples).
 
+### Approach to causal-inference
+Some key points on how we address causal-inference estimation
+
+##### 1. Emphasis on potential outcome prediction  
+Causal effect may be the desired outcome. 
+However, every effect is defined by two potential (counterfactual) outcomes.  
+We adopt this two-step approach by separating the effect-estimating step 
+from the potential-outcome-prediction step.  
+A beneficial consequence to this approach is that it better supports 
+multi-treatment problems where "effect" is not well-defined.
+
+##### 2. Stratified average treatment effect
+The causal inference literature devotes special attention to the population 
+on which the effect is estimated on.
+For example, ATE (average treatment effect on the entire sample),
+ATT (average treatment effect on the treated), etc.  
+By allowing out-of-bag estimation, we leave this specification to the user.
+For example, ATE is achieved by `model.estimate_population_outcome(X, a)`
+and ATT is done by stratifying on the treated: `model.estimate_population_outcome(X.loc[a==1], a.loc[a==1])`
+
+##### 3. Families of causal inference models
+We distinguish between two types of models:
+* *Weight models*: weight the data to balance between the treatment and control groups, 
+   and then estimates the potential outcome by using a weighted average of the observed outcome.  
+   Inverse Probability of Treatment Weighting (IPW or IPTW) is the most known example of such models. 
+* *Direct outcome models*: uses the covariates (features) and treatment assignment to build a
+   model that predicts the outcome directly. The model can then be used to predict the outcome
+   under any assignment of treatment values, specifically the potential-outcome under assignment of
+   all controls or all treated.  
+   These models are usually known as *Standardization* models, and it should be noted that, currently,
+   they are the only ones able to generate *individual effect estimation* (otherwise known as CATE).
+
+##### 4. Confounders and DAGs
+One of the most important steps in causal inference analysis is to have 
+proper selection on both dimensions of the data to avoid introducing bias:
+* On rows: thoughtfully choosing the right inclusion\exclusion criteria 
+  for individuals in the data. 
+* On columns: thoughtfully choosing what covariates (features) act as confounders 
+  and should be included in the analysis.
+
+This is a place where domain expert knowledge is required and cannot be fully and truly automated
+by algorithms. 
+This package assumes that the data provided to the model fit the criteria.   
+However, filtering can be applied in real-time using a scikit-learn pipeline estimator
+that chains preprocessing steps (that can filter rows and select columns) with a causal model at the end.
+
diff --git a/causallib/README.md b/causallib/README.md
@@ -47,7 +47,10 @@ These can be used within a pipeline framework together with the models.
 ### `datasets`
 Several datasets are provided within the package in the `datasets` module:  
 * NHEFS study data on the effect of smoking cessation on weight gain.
-* simulation module allows creating simulated data based on a causal graph
+  Adapted from [Hernán and Robins' Causal Inference Book](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/)
+* A handful of simulation sets from the [2016 Atlantic Causal Inference 
+  Conference (ACIC) data challenge](https://jenniferhill7.wixsite.com/acic-2016/competition). 
+* Simulation module allows creating simulated data based on a causal graph
   depicting the connection between covariates, treatment assignment and outcomes.
 
 ### Additional folders

diff --git a/causallib/__init__.py b/causallib/__init__.py
@@ -0,0 +1 @@
+__version__ = "0.6.0"
diff --git a/causallib/contrib/README.md b/causallib/contrib/README.md
@@ -0,0 +1,29 @@
+# Module `causallib.contrib`
+This module currently includes additional causal methods contributed to the package 
+by causal inference researchers other than `causallib`'s core developers.
+
+The causal models in this module can be slightly more novel then in the ones in `estimation` module. 
+However, they should largely adhere to `causallib` API 
+(e.g., `IndividualOutcomeEstimator` or `WeightEstimator`).
+Since code here is more experimental, 
+models might also require additional (and less trivial) package dependencies, 
+or have less test coverage.  
+Well-integrated models could be transferred into the main `estimation` module in the future.
+
+## Contributed Methods
+Currently contributed methods are:
+
+1. Adversarial Balancing: implementing the algorithm described in 
+   [Adversarial Balancing for Causal Inference](https://arxiv.org/abs/1810.07406).
+   ```python
+   from causallib.contrib.adversarial_balancing import AdversarialBalancing
+
+## Dependencies
+Each model might have slightly different requirements.  
+Refer to the documentation of each model for the additional packages it requires.
+
+Requirements for `contrib` models will be concentrated in `contrib/requirements.txt` and should be
+automatically installed using the extra-requirements `contrib` flag:  
+```shell script
+pip install causallib[contrib]
+```   
diff --git a/causallib/contrib/__init__.py b/causallib/contrib/__init__.py
diff --git a/causallib/contrib/adversarial_balancing/__init__.py b/causallib/contrib/adversarial_balancing/__init__.py
@@ -0,0 +1 @@
+from .adversarial_balancing import AdversarialBalancing