NOTICE: This package will be under heavy development until publication, and will be subject to changes until release 0.1
Single Cell Ancestral Node Taxonomy Inference by Partitioning Of Differential Expression. The model is an extension of the SCVI paradigm--a structured generative, variational inference model developed for the simultaneous analysis (DE) and categorization (taxonomy generation) of cell types across evolution (or now any covariate) using single-cell RNA-seq data. Long ago it began as a hack of a simplified model of scANVI and is built on the pytorch-based PPL pyro. The model acts as an integration method, that learns interpretable differential expression in the process. Note that this means ANTIPODE will fail to integrate datasets of different datasets, or datasets with large disparities in quality or gene mean dispersions.
The complete procedure runs in 3 phases (but can also run fully supervised using only phase 2):
-
The Fuzzy Phase: Cells may belong to multiple types sampled from a bernoulli distribution, learns an integrated latent space with covariate effects, but is less straightforward to interpret.
-
The Supervised Phase: Discrete clustering is initialized from a supervised initialization (or defaults to a de novo k-means clustering in the latent space). Can take a supervised clustering and/or latent space for cells.
-
The Free Phase: All parameters are released for unconstrained learning.
You can read about the generative model here. You can look at example runs here.
First create a conda environment with python >= 3.10
git clone [email protected]:mtvector/scANTIPODE.git
#cuda 11.7 should work too
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install jax jaxlib -c conda-forge
cd scANTIPODE
pip install -e .
Please reach out to let me know if you try ANTIPODE on a dataset and it works (or doesn't work)... The model is (forever) a work in process!
Note that the model can be VRAM hungry, with parameters scaling by #covariates x #genes x #clusters|#modules... if you run out of vram, you might need to 1. fix a GPU memory leak, 2. use fewer genes/latent dimensions/cluster, 3. get a bigger GPU
- Improved plotting functionality
- Expanded tutorials
- PyPI release
- Gene expression histogram normalization
- Phylogeny regression
- Parameter variance estimation
- Improved clustering