-
Notifications
You must be signed in to change notification settings - Fork 25
Read Threading
The thread
command error corrects reads so they match the graph, then uses sequences that match the graph to generate links. Links connect kmers across collapsed repeats and allow us to traverse through them. If a read does not span any repeats, it will not generate any links.
Before running the thread
command, you should run the inferedges
command on the graph.
Paired end reads that overlap should also be merged in order to give the most information. The read threading steps attempts to error correct reads and fill in insert gaps, but it cannot merge overlapping reads. Instead there are many good tools for this. We considered implementing this functionality ourselves, but want to avoid "re-inventing the wheel". Any solution we came up with would be unlikely to outperform existing solutions. We suggest using FLASH, COPE, PEAR or PANDASeq.
If you don't merge the paired end reads when they overlap you'll see very few read pairs have their insert gaps filled. You'll get the following warning:
[16 Apr 2016 12:18:33-LOx][generate_paths.c:422] Warn: Reads may overlap in fragment: 151 + 151 > frag len min: 0; max: 1000
This means you may lose a lot of long distance connectivity information that is in the reads. In some cases it may increase the rate of errors in your graph links.
If you have a lot of overlapping read pairs and you can't merge them, we recommend only using single ended reads in the threading stage (--seq
instead of --seq2
). This will reduce your contig N50 but you'll make fewer assembly mistakes.
If you only want to error correct reads to match the graph, you can use the McCortex correct command.