You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Francisco
While running the -t abundance I am running into a problem with the sam to bam conversation due to a duplicate header. When I checked the grep "NODE_1753_length_683_cov_1.602310" FDSGM6.sam -c I found more than one occurrence.
Converting SAM to BAM with samtools view ...
[W::sam_hdr_create] Duplicated sequence "NODE_1753_length_683_cov_1.602310" in file "FDSGM6.sam"
[E::sam_hrecs_update_hashes] Duplicate entry "NODE_1753_length_683_cov_1.602310" in sam header
samtools view: failed to add PG line to the header
I am also surprised to see that 2 of 35 samples ran fine. and if there was some issue it would have created problems for all the samples. So I am not able to figure out the issue. Could you please suggest a solution?
Thank you
Kunal
The text was updated successfully, but these errors were encountered:
Hi Kunal, thanks again for reporting this issue and apologies for my delay in responding.
I believe that this problem occurs due to the fact that contig headers are not guaranteed to be unique across MAGs. It is unlikely, but evidently it is possible that two different contigs are assembled with the same node_length_coverage values across different genomes from the same sample, in which case you will see those errors you have reported.
I will have to think about the best way to deal with this in the future, but for now you can simply add a unique MAG ID at the start of each contig header when concatenating MAGs to avoid this issue.
echo -e "\nConcatenating all bins into one FASTA file ... "
cat *.fa > $(basename {output}).fa
Instead of using cat, use a loop to append MAG ID at the start of each contig e.g. assuming your mags for a given sample are in subfolder mags with .fa extension:
ls mags|grep ".fa"|while read genome;do paste mags/$genome| sed "s/>/>${genome%.*}_/g";done > sample.fa
Hi! Francisco
While running the
-t abundance
I am running into a problem with the sam to bam conversation due to a duplicate header. When I checked the grep "NODE_1753_length_683_cov_1.602310" FDSGM6.sam -c I found more than one occurrence.I am also surprised to see that 2 of 35 samples ran fine. and if there was some issue it would have created problems for all the samples. So I am not able to figure out the issue. Could you please suggest a solution?
Thank you
Kunal
The text was updated successfully, but these errors were encountered: