Benchmarking experiments.

The code base is adapted from the benchmarking code of Pangenie published in Nature Genetics doi: 10.1038/s41588-022-01043-w

Prerequisites

Tools to be installed:

pangenie (https://github.com/eblerjana/pangenie)
Graphtyper(https://github.com/DecodeGenetics/graphtyper)
paragraph (https://github.com/Illumina/paragraph)
Gatk (https://gatk.broadinstitute.org/hc/en-us)

Input data

Reference Genome(GRCh38/hg38) https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/
Download Haploid resolved Assemblies:

# Download HG002
wget  ftp://ftp.dfci.harvard.edu/pub/hli/whdenovo/asm/NA24385-denovo-H1.fa.gz
wget  ftp://ftp.dfci.harvard.edu/pub/hli/whdenovo/asm/NA24385-denovo-H2.fa.gz

#Download  HG00731
wget  http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC2/working/20200417_Marschall-Eichler_NBT_hap-assm/HG00731_hgsvc_pbsq2-ccs_1000-pereg.h1-un.racon-p2.fasta
wget  http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC2/working/20200417_Marschall-Eichler_NBT_hap-assm/HG00731_hgsvc_pbsq2-ccs_1000-pereg.h2-un.racon-p2.fasta

Call Phased Variants from haploid resolved assemblies using this Snakemake workflow to call phased variants: https://bitbucket.org/jana_ebler/vcf-merging/src/master/
Download Short read sample for HG00731.

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR324/004/ERR3241754/ERR3241754_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR324/004/ERR3241754/ERR3241754_2.fastq.gz

Subsample the Short reads to 5,10,20, and 30x coverage, and map all subsamples to the reference genome using bwa.
Build CCDG using all the subsamples using snakemake script in build_test_CCDG/
1. edit config.json to specify the outputfolder and kmersize(we used 31)
2. Enter the pathes of the subsampled fastq files in subsample_table.csv
3. run 'snakemake -j8 --use-conda'

Run The Benchmarking workflow

The benchmarking snakemake workflow is in benchmark subfolder. We first need to edit config.json to configure the pathes of the input and output data. run the workflow using 'snakemake -j8 --use-conda'

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
benchmark		benchmark
build_test_CCDG		build_test_CCDG
scripts		scripts
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking experiments.

Prerequisites

Tools to be installed:

Input data

Run The Benchmarking workflow

About

Releases 1

Packages

Languages

License

dib-lab/TheGreatGenotyper_benchmark

Folders and files

Latest commit

History

Repository files navigation

Benchmarking experiments.

Prerequisites

Tools to be installed:

Input data

Run The Benchmarking workflow

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages