This repository deposits the codes and results of 2024 Bioinformatic course (AI3608) project.
Our project reanalysed the RNA-seq data and reproduced partial results from the paper:
- Lin, X., Swedlund, B., Ton, ML.N. et al. Mesp1 controls the chromatin and enhancer landscapes essential for spatiotemporal patterning of early cardiovascular progenitors. Nat Cell Biol 24, 1114–1128 (2022). https://doi.org/10.1038/s41556-022-00947-3
Zikun Yang(杨子坤), Zhou Peng(彭周)
We implemented a transcripome data processing pipeline based on Snakemake. The pipeline can automatically process raw fastq data from SRA and go on quality control, mapping, count reads and normalization. The detailed procedure is shown below: The pipeline requires:
- a csv file : link SRR ID to the readable and understandable name you set
- a json file : record software path, data type(single end/pair end) and the normalization method
- raw fastq files : SRRxxxxxx_1/2.fastq.gz for pair end data and SRRxxxxxx.fastq.gz for single end data
After running the snakemake pipeline, we get the expression martix for downstream analysis.
Then We use R package DESeq2 to identify the differentially expressing genes and separately classify upregulated/downregulated genes into 3 categories: Early, Constant, Late. Please refer to the paper for definitions and methods.
We also analysed the samples' correlation. The results are shown below:
We plot the expression patterns of genes in different stages and conditions. Our result closely mirrors the paper's, although they appear slightly more disordered. This is probably due to our more lenient quality control standard.(We do not filter "genes with a s.d. higher than 50% of the mean expression" as we did not understand the meaning of this requirement)
The number of genes we identified is shown below. We broadly replicate the result shown in paper but get more genes for almost all categories.
Thanks to Prof. Ya Guo(郭亚) and T.A. Zhiyu Zhang(张之宇) for their guidance.
./RNApipeline
contain codes and configuration files for the pipeline../analysis
contain R scripts and data for out results.