-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diginorm in context of reference based transcriptome analysis #69
Comments
To follow up on this idea, I decided to find a deeply sequenced RNA sample to test the time and computational resource required for analysis using the new HISAT2/StringTie softwares that replace Tophat2/Cufflinks. The aim was to document how diginorm can improve such a process. Materials:I searched the ENCODE project for a deeply sequenced sample. I found a sample prepared from the cytoplasmic fraction of independent growths of cell line SK-N-SH. It is a PE101 Illumina Hi-Seq RNA-Seq library from rRNA-depleted Poly-A+ RNA. The sample has 241,123,024 PE reads. Analysis pipeline
Results
ConclusionWith whatever confidence we can make from one sample, we can conclude that the new software pipeline HISAT2/StringTie is way more faster with much less memory requirement than their predecessors. I do not think diginorm can add much here (the interleave and compression of the 2 reads to fit the input of khmer scripts took more than 18 hours !!) and ths I decided to stop this here . |
I am summarizing my experince with digital normalization in reference based RNAseq analysis and possible future directions. I went through 3 use cases:
The text was updated successfully, but these errors were encountered: