This repository contains up-to-date evaluation and polishing workflows to adapt on general genome assembly projects, with most of the ideas developed and described in this paper.
For exact command lines and workflows used to generate the T2T-CHM13v1.0 and T2T-CHM13v1.1 assemblies, please refer to the Methods section in the CHM13-Issues repo. Note that some of the tools have been updated since then, and are tracked on this repo.
- QV estimate with hybrid k-mer db
- Homopolymer and 2-mer microsatellite run length comparison
- K* metric
- Repeat-aware Winnowmap2 alignments and Marker assisted alignment filtering
- Coverage analysis
- Polish SV and SNV like errors through a case study on Chr. 20
- Legacy automated polishing with Racon + Merfin -- Recommend Patrick Grady's: latest version of Automated polishing
Variant call, refinements and formatting (Also see Error Detection)
Please cite if any of the codes shared in this repo was used:
Mc Cartney AM, Shafin K, Alonge M et al. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat Methods (2022) doi: https://doi.org/10.1038/s41592-022-01440-3