You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered a situation where the output from STAR was not readable by samtools, causing 'samtools view' to give an exception due to "corrupt" aux data:
[E::sam_format1_append] Corrupted aux data for read LH00192:82:22GMCNLT3:1:1128:12373:5198_TCGATGTCGGCTCT
It took a while to track down and to get it to occur in SAM output to be able to see why, but what appears to be happening is that there are some null's embedded in the MD tag causing premature ending of the tag data. This results in parsers assuming that the remaining part of the MD tag is another tag, which turns out to have an invalid type code.
Looking at the SAM line that gives the problem doesn't show it immediately:
But looking at a hex dump of that line shows that it ends with:
... 0 \t N M : i : 3 \t M D : Z : 1 1 \0 0 \0 0 \0 0 \n
This only occurs in two-pass mode and I think it has something to do with the alignment occurring at the very end of the chr NT_187594.1 -- it seems to be reading past the end of that template and grabbing some nulls to use in the MD tag.
Note that this is pretty hard to reproduce -- I only saw it on one file that had ~10M reads before hitting this -- shortening the input removed the error.
The text was updated successfully, but these errors were encountered:
I encountered a situation where the output from STAR was not readable by samtools, causing 'samtools view' to give an exception due to "corrupt" aux data:
[E::sam_format1_append] Corrupted aux data for read LH00192:82:22GMCNLT3:1:1128:12373:5198_TCGATGTCGGCTCT
It took a while to track down and to get it to occur in SAM output to be able to see why, but what appears to be happening is that there are some null's embedded in the MD tag causing premature ending of the tag data. This results in parsers assuming that the remaining part of the MD tag is another tag, which turns out to have an invalid type code.
Looking at the SAM line that gives the problem doesn't show it immediately:
2421241:LH00192:82:22GMCNLT3:1:1128:12373:5198_TCGATGTCGGCTCT 83 NT_187594.1 191674 255 104S14M19S = 191674 -14 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTCTCAAAAAAAAAAACGACACCCCAGATGTCTGAGTGTGACTGGAGTTCCTTGGCACCCGAGAATTCCACCGACAACAACCTTTCTGCTTCACAATGATAGGA IIIIII9III9999I99I9999999999999-------9I9--9-I------9-9-I-I9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NH:i:1 HI:i:1 AS:i:21 nM:i:0 NM:i:3 MD:Z:11000
But looking at a hex dump of that line shows that it ends with:
... 0 \t N M : i : 3 \t M D : Z : 1 1 \0 0 \0 0 \0 0 \n
This only occurs in two-pass mode and I think it has something to do with the alignment occurring at the very end of the chr NT_187594.1 -- it seems to be reading past the end of that template and grabbing some nulls to use in the MD tag.
Note that this is pretty hard to reproduce -- I only saw it on one file that had ~10M reads before hitting this -- shortening the input removed the error.
The text was updated successfully, but these errors were encountered: