-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confirmed non-determinism in the DRAGMAP aligner output #31
Comments
I'm having trouble finding out what dragmap command line was used from the above. |
@rpetrovski , from what I get, the picard tool compareSams does not care about output order and compares actual alignments. So problem may be deeper than that. |
@rpetrovski The Dragmap command is on line 64 of https://github.com/broadinstitute/warp/blob/jw_test_Dragmap_alignment/tasks/broad/DragmapAlignment.wdl:
The differences are more than just trivial differences in ordering -- we first noticed this because the downstream variant calls changed. |
(I'll add that among other things, the total number of aligned reads was found to differ across runs) |
Hi,
|
Hi, we have been able to reproduce the issue with your command line. The insert size statistics computed at the beginning does vary a little from one run to the other (e.g. from 337.38 to 337.41) |
@rizkg I uploaded a pair of example differing output bams today to the link you provided over email. Can you confirm that the differences present in those output bams could be caused by the insert size statistics? Thanks for your help! |
Yes thanks, I got the files. I will check if there is some other issue. |
I am wondering the same thing. For building pipelines for non-model species, is it "safe" to use dragmap yet, or should we default to bwa-mem? |
Hi all, apologies for the delay on this, we've been short staffed. We have a PR with the suggested change from James and will prioritize testing it to confirm it fixes the issue as soon as we can. |
The pipelines team here at the Broad Institute has confirmed non-determinism in the DRAGMAP output -- here is their report on the issue:
Description
When the same unmapped bam is run through the Dragmap aligner twice, the resulting aligned bams do not always match.
This was discovered when running the DRAGEN-GATK whole genome germline single sample pipeline. In order to confirm that the Dragmap aligner was producing different results, it was isolated run twice on the same input, then the outputs were compared. This was repeated 20 times. The single sample used (NA12878) has 24 unmapped bams which run through the Dragmap aligner individually. This was a total of 480 comparisons (24 unmapped bams aligned twice and outputs compared (later referred to as shards), all repeated 20 times (later referred to as runs)). Of the 480 comparisons, 47 resulted in differences. These differences were not consistent across runs; that is, for the 24 shards for a single run, sometimes 1 would fail, sometimes 3 would fail (or 2, or 4) and which specific shards failed also varied. What was consistent was that in every run at least 1 of the 24 shards failed.
Steps to reproduce
Run Dragmap aligner on an input bam twice and compare the outputs. This may need to be repeated several times since the differences are not consistent across runs (most of the time the alignment produces identical results, but sometimes it produces different results).
The WDL used for the experiment described above can be found here: https://github.com/broadinstitute/warp/blob/jw_test_Dragmap_alignment/scratch/DragmapAlign.wdl
And the actual Dragmap command line used is in this WDL:
https://github.com/broadinstitute/warp/blob/jw_test_Dragmap_alignment/tasks/broad/DragmapAlignment.wdl
Actual behavior
Approximately 10% of the time that Dragmap is run on the same input twice, the outputs are not identical.
Expected behavior
Each time you run the same input through Dragmap, it produces the same output.
Supporting files and details
Of the 47 failing comparisons mentioned above, we looked closer at one. When these output bams are compared using the Picard tool
CompareSams
the result is as follows:We can arrange to send you the actual differing output bams from this test if it would be helpful in diagnosing the problem.
The text was updated successfully, but these errors were encountered: