Skip to content
Isaac Turner edited this page Apr 16, 2016 · 9 revisions

The thread command error corrects reads so they match the graph, then uses sequences that match the graph to generate links. Links connect kmers across collapsed repeats and allow us to traverse through them. If a read does not span any repeats, it will not generate any links.

Before running the thread command, you should run the inferedges command on the graph.

Merging Overlapping Paired End Reads

Paired end reads that overlap should also be merged in order to give the most information. The read threading steps attempts to error correct reads and fill in insert gaps, but it cannot merge overlapping reads. Instead there are many good tools for this. We considered implementing this functionality ourselves, but want to avoid "re-inventing the wheel". Any solution we came up with would be unlikely to outperform existing solutions. We suggest using FLASH, COPE, PEAR or PANDASeq.

If you don't merge the paired end reads when they overlap you'll see very few read pairs have their insert gaps filled. You'll get the following warning:

[16 Apr 2016 12:18:33-LOx][generate_paths.c:422] Warn: Reads may overlap in fragment: 151 + 151 > frag len min: 0; max: 1000

This means you may lose a lot of long distance connectivity information that is in the reads. In some cases it may increase the rate of errors in your graph links.

If you have a lot of overlapping read pairs and you can't merge them, we recommend only using single ended reads in the threading stage (--seq instead of --seq2). This will reduce your contig N50 but you'll make fewer assembly mistakes.

Error correction

If you only want to error correct reads to match the graph, you can use the McCortex correct command.

Clone this wiki locally