Patho-DBiT: Spatially exploring RNA biology in archival formalin-fixed paraffin-embedded (FFPE) tissues
Libraries were sequenced on an Illumina NovaSeq 6000 Sequencing System with a paired-end 150bp read length.
Scripts are included in the "Sequence alignment" folder.
Read 1: Contains the cDNA sequence
Read 2: Contains the spatial Barcode A, Barcode B and UMIs
The Read 2 sequence needs to be reformated to run ST Pipeline, as explained in the 'Reformat_Read2.md' file.
To reformat the Read 2, run 'FASTQ_process.sh'.
All raw Read 1 and reformatted Read 2 files can be downloaded from the NCBI Gene Expression Omnibus (GEO) under the accession number GSE274641
Run 'ST_Pipeline.sh' with Processed_R2.fastq.gz and Raw_R1.fastq.gz as inputs.
The pipeline requires a spatial barcode index file to decode spatial locations. Two files are provided in the folder: one for Patho-DBiT 50x50 datasets and another for the expanded 100x100 dataset.
The Mouse GRCm38-mm10 or human GRCh38 reference genome was used with STAR v2.7.7a.
Run 'Convert_ENSEMBL_to_gene_name.sh' to annotate the matrix output from the ST pipeline.
Scripts are included in the "Mapping of non-coding RNAs" folder.
Run 'Build_GTF_Pipeline.sh' in the "GRCh38" or "mm39" folder to build the genomic reference for human or mouse.
Run 'count_ncRNAs.sh' in the "ncRNA types" folder.
Scripts are included in the "Spatial alternative splicing" folder. Run 'rMATS_pipeline.sh' with BAM file generated by 'ST_Pipeline.sh'.
The pipeline also requires the following input files: a GTF annotation file, a spatial barcode index file, a TSV file linking cluster IDs to spatial pixel positions, and a TSV file linking cluster IDs to region names.
Scripts are included in the "Spatial A-to-I RNA editing" folder. Run 'RNA_editing_pipeline.sh' with BAM file generated by 'ST_Pipeline.sh'.
The pipeline also requires the following input files: a reference genome fasta file, a GTF annotation file, a reference RNA editing sites file, a TSV file linking cluster IDs to spatial pixel positions, and a TSV file linking cluster IDs to region names.
Scripts are included in the "Spatial variant analysis" folder.
Run 'BWAmap.sh' in the "pipeline" folder to perform whole-genome sequencing (WGS) data alignment and analysis.
Run 'analyzeVariant.sh' in the "pipeline" folder to generate mutation-by-pixel expression matrix.
The pipeline also requires a CSV file containing cluster IDs linked to spatial pixel positions.
Scripts are included in the "Spatial RNA dynamics" folder. Run 'RNAdynamics.sh' with spliced and unspliced count matrices as input.
The pipeline also requires two input CSV files: one containing UMAP embeddings and another with cluster IDs, both linked to spatial pixel positions.
Super-resolved tissue architecture was generated by integrating the Patho-DBiT gene expression matrix with high-resolution histology using iStar
Scripts are included in the "Spatial data visualization" folder.
Follow the steps listed in 'Image processing.md' for pixel identification, then run 'Pixel_identification.m' to generate a "position.txt" file containing the identified useful pixels from the image.
Perform spatial unsupervised clustering analysis by executing 'Patho-DBiT_Clustering.Rmd'.