This is the implementation of the DeepCoDA model in the paper "DeepCoDA: personalized interpretability for compositional health data", ICML 2020: https://arxiv.org/abs/2006.01392
Interpretability allows the domain-expert to directly evaluate the model's relevance and reliability, a practice that offers assurance and builds trust. In the healthcare setting, interpretable models should implicate relevant biological mechanisms independent of technical factors like data pre-processing.
Some health data, especially those generated by high-throughput sequencing experiments, have nuances that compromise precision health models and their interpretation. These data are compositional, meaning that each feature is conditionally dependent on all other features.
We propose the DeepCoDA framework to extend precision health modelling to high-dimensional compositional data, and to provide personalized interpretability through patient-specific weights. Our architecture maintains state-of-the-art performance across 25 real-world data sets, all while producing interpretations that are both personalized and fully coherent for compositional data.
- Python 3.6
- scikit-learn 0.23.1
- keras 2.24
- tensorflow 1.10.0
- seaborn 0.11
- To run the model without attention: "python DeepCoDA_without_attention.py --dataset data_id --level B --l1 lambda"
- To run the model with attention: "python DeepCoDA_with_attention.py --dataset data_id --level B --l1 lambda"
- data_id is a dataset ID (default is "5a"). If dataset_id is "all", then the model will run with all datasets
- B is the number of log-bottlenecks (default is "5")
- l1 is L1 penalty term (default is "0.01")
Thomas P. Quinn, Dang Nguyen, Santu Rana, Sunil Gupta, Svetha Venkatesh (2020). DeepCoDA: personalized interpretability for compositional health data. ICML 2020