We used: 10 HG001 PCR-free, 2 HG005 PCR-free, 4 HG001 PCR+ for training.
Among these 16 BAM files, 6 of them are from public sources:
BAM file (--reads ) |
PCR-free? | FASTA file (--ref ) |
Truth VCF (--truth_variants ) |
BED file (--confident_regions ) |
---|---|---|---|---|
HG001-NA12878-pFDA.merged.sorted.bam(1) | Yes | GRCh38_Verily_v1.genome.fa | NISTv3.3.2/GRCh38 | NISTv3.3.2/GRCh38 |
NA12878D_HiSeqX_R1.deduplicated.bam(2) | No | hs37d5.fa | NISTv3.3.2/GRCh37 | NISTv3.3.2/GRCh37 |
NA12878J_HiSeqX_R1.deduplicated.bam(2) | No | hs37d5.fa | NISTv3.3.2/GRCh37 | NISTv3.3.2/GRCh37 |
NA12878-Rep01_S1_L001_001_markdup.bam(2) | No | hs37d5.fa | NISTv3.3.2/GRCh37 | NISTv3.3.2/GRCh37 |
N3C9-2plex1-L1-171212B-NA12878-1_S1_L001_001_markdup.bam(3) | Yes | hs37d5.fa | NISTv3.3.2/GRCh37 | NISTv3.3.2/GRCh37 |
NexteraFlex-2plex1-L1-NA12878-1_S1_L001_001_markdup.bam(4) | No | hs37d5.fa | NISTv3.3.2/GRCh37 | NISTv3.3.2/GRCh37 |
(1): FASTQ files from Precision FDA Truth Challenge.
(2): BAM files provided by DNAnexus.
(3): FASTQ files from
BaseSpace public data: NovaSeq S1 Xp: TruSeq Nano 350 (Replicates of NA12878)/Samples/N3C9_2plex1_L1_171212B_NA12878-1/Files/N3C9-2plex1-L1-171212B-NA12878-1_S1_L001_R1_001.fastq.gz
and N3C9-2plex1-L1-171212B-NA12878-1_S1_L001_R2_001.fastq.gz
(4): FASTQ files from
BaseSpace public data: NovaSeq S1 Xp: Nextera DNA Flex (Replicates of NA12878)/Samples/NexteraFlex_2plex1_L1_NA12878-1/Files/NexteraFlex-2plex1-L1-NA12878-1_S1_L001_R1_001.fastq.gz
and NexteraFlex-2plex1-L1-NA12878-1_S1_L001_R2_001.fastq.gz
We generated our own BAM files using BWA-MEM to map the reads to the reference, and sorts the output. We also mark duplicated reads.