SPIN is a simple, Scanpy-based implementation of the subsampling and smoothing approach described in the manuscript Mitigating autocorrelation during spatially resolved transcriptomics data analysis. It enables the alignment and analysis of transcriptionally defined tissue regions across multiple SRT datasets, regardless of morphology or experimental technology, using conventional single-cell tools. Here we include information regarding:
- A conceptual overview of the approach
- Package requirements
- Installation instructions
- Basic usage principles
For examples of downstream analysis (e.g. differentially expressed gene analysis and trajectory inference), see the tutorial notebook. For further details on SPIN parameters, import SPIN into Python as shown below and run help(spin)
.
- Conventional single-cell analysis can identify molecular cell types by considering each cell individually.
- However, it does not incorporate spatial information.
- Arguably the simplest way to incorporate spatial information and identify molecular tissue regions is to spatially smooth gene expression features across neighboring cells in the tissue.
- This can be done by setting the features of each cell to the average of its spatial neighborhood.
- However, a problem arises when smoothed representations of each cell are compared to one another.
- Physically adjacent cells will have almost identical neighborhoods and thus almost identical smoothed representations.
- Thus, we end up with nearest neighbors in feature space that are just nearest neighbors in physical space.
- Because conventional methods for downstream anlaysis rely on the nearest neighbors graph in feature space, this leads to reconstruction of physical space in latent space rather than representing the true underlying large scale molecular patterns.
- Here, we implement an approach in which each cell's spatial neighborhood is randomly subsampled before averaging, allowing the exact neighborhood composition to vary while still maintaining the general molecular composition.
Ultimately, this approach enables the application of conventional single-cell tools to spatial molecular features in SRT data, yielding regional analogies for each tool. For more details and examples, please refer to the manuscript and tutorial.
- Tested on MacOS (Monterey, Ventura) and Linux (Red Hat Enterprise Linux 7).
- Command Line Tools is required for
pip
installing this package from GitHub. While it comes standard on most machines, those without it may encounter anxcrun: error
when following the installation instructions below. See here for simple instructions on how to install it. - Python >= 3.9
- The only dependency is Scanpy. For details, see
pyproject.toml
.
- One or more SRT datasets in
.h5ad
format - An expression matrix under
.X
(both sparse and dense representations supported) - Spatial coordinates under
.obsm
(key can be specified with argumentspatial_key
) - Batch information
- If multiple batches in single dataset, batch labels provided under column in
.obs
with column namebatch_key
. - If multiple batches in separate datasets, batch labels for each dataset provided as input.
- If multiple batches in single dataset, batch labels provided under column in
pip install git+https://github.com/wanglab-broad/spin@main
Takes ~5 mins.
Consider the marmoset and mouse data from the manuscript which we provide as a demo:
import scanpy as sc
adata_marmoset = sc.read(
'data/marmoset.h5ad',
backup_url='https://zenodo.org/record/8092024/files/marmoset.h5ad?download=1'
)
adata_mouse = sc.read(
'data/mouse.h5ad',
backup_url='https://zenodo.org/record/8092024/files/mouse.h5ad?download=1'
)
These datasets can be spatially integrated and clustered using spin
. The batch_key
argument corresponds to the name of a new column in adata.obs
that stores the batch labels for each dataset. The batch_labels
argument is a list of these batch labels in the same order as the input AnnDatas:
from spin import spin
adata = spin(
adatas=[adata_marmoset, adata_mouse],
batch_key='species',
batch_labels=['marmoset', 'mouse'],
resolution=0.7
)
This performs the following steps:
integrate
:- Subsampling and smoothing of each dataset individually (stored under
adata.layers['smooth']
) - Joint PCA across both smoothed datasets
- Integration of the resulting PCs using Harmony (stored under
adata.obsm['X_pca_spin']
)
- Subsampling and smoothing of each dataset individually (stored under
cluster
:- Latent nearest neighbor search
- Leiden clustering with a resolution of 0.7 (stored under
adata.obs['region']
) - UMAP (stored under
adata.obsm['X_umap_spin']
)
Note that spin
can equivalently take as input a single AnnData containing multiple labeled batches. It can also take a single AnnData containing one batch for finding regions in a single dataset. For examples, see the tutorial.
The resulting region clusters can then be visualized using standard Scanpy functions:
# In physical space
sc.set_figure_params(figsize=(7,5))
sc.pl.embedding(adata, basis='spatial', color='region')
# In UMAP space
sc.set_figure_params(figsize=(4,4))
sc.pl.embedding(adata, basis='X_umap_spin', color='region')
Downstream analysis (e.g. DEG analysis, trajectory inference) can then be performed using standard Scanpy functions as well.
For examples of downstream analysis, see the tutorial.
For further details on the parameters of spin
, import SPIN into Python as shown above and run help(spin)
.
SPIN can be executed from the shell using the spin
command as shown below (the path is identified automatically; see spin_cli
and pyproject.toml
)
Shell submission requires a read path to the relevant dataset(s) as well as a write path for the output dataset. Otherwise, provide the same parameters you would when running in Python as above:
spin \
--adata_paths data/marmoset.h5ad data/mouse.h5ad \
--write_path data/marmoset_mouse_spin.h5ad \
--batch_key species \
--batch_labels marmoset mouse \
--resolution "0.7"
Just as when running in Python, a single AnnData containing multiple batches can be passed in instead, as well as just a single dataset containing a single batch.