Skip to content

This repository deposits the codes and results of 2024 Bioinformatic course (AI3608) project.

Notifications You must be signed in to change notification settings

Eric-Y-S/Bioinformatic-AI3608-Program

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bioinformatic-AI3608-Project

Description

This repository deposits the codes and results of 2024 Bioinformatic course (AI3608) project.

Our project reanalysed the RNA-seq data and reproduced partial results from the paper:

  • Lin, X., Swedlund, B., Ton, ML.N. et al. Mesp1 controls the chromatin and enhancer landscapes essential for spatiotemporal patterning of early cardiovascular progenitors. Nat Cell Biol 24, 1114–1128 (2022). https://doi.org/10.1038/s41556-022-00947-3

Contributors

Zikun Yang(杨子坤), Zhou Peng(彭周)

Results

1. Implemented a transcripome data processing pipeline

We implemented a transcripome data processing pipeline based on Snakemake. The pipeline can automatically process raw fastq data from SRA and go on quality control, mapping, count reads and normalization. The detailed procedure is shown below: image The pipeline requires:

  • a csv file : link SRR ID to the readable and understandable name you set
  • a json file : record software path, data type(single end/pair end) and the normalization method
  • raw fastq files : SRRxxxxxx_1/2.fastq.gz for pair end data and SRRxxxxxx.fastq.gz for single end data

After running the snakemake pipeline, we get the expression martix for downstream analysis.

2. Identified and classified differentially expressing genes

Then We use R package DESeq2 to identify the differentially expressing genes and separately classify upregulated/downregulated genes into 3 categories: Early, Constant, Late. Please refer to the paper for definitions and methods.

We also analysed the samples' correlation. The results are shown below:

image

3. Analysed calling results and successfully reproduced key findings

We plot the expression patterns of genes in different stages and conditions. Our result closely mirrors the paper's, although they appear slightly more disordered. This is probably due to our more lenient quality control standard.(We do not filter "genes with a s.d. higher than 50% of the mean expression" as we did not understand the meaning of this requirement) image

The number of genes we identified is shown below. We broadly replicate the result shown in paper but get more genes for almost all categories. image

Acknowledgement

Thanks to Prof. Ya Guo(郭亚) and T.A. Zhiyu Zhang(张之宇) for their guidance.

File declaration

  • ./RNApipeline contain codes and configuration files for the pipeline.
  • ./analysis contain R scripts and data for out results.

About

This repository deposits the codes and results of 2024 Bioinformatic course (AI3608) project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published