Releases: ncbi/pgap
2024-07-18.build7555
This release, as well as the previous one, is based on PGAP-6.7.
- Incorporation of GeneOntology 2024-06-10
- BUG FIX: restore genus-only functionality: #311, #305, #304, #303
- structural annotation algorithm improvements and bug fixes
- TaxCheck fixes:
- Synchronize code in TaxCheck buildruns with code used in internal processes
- Introduce handling of plasmids in contamination (i.e., prevent promiscuous plasmids from counting as contaminants)
- Introduce lineage match options for MAGs
- REQUEST: #273. PGAP now writes only to the output directory, including transient files.
- DATA FIX: #308. We added an HMM NF046015 to correct the name of the protein
Third Party software versions
- tRNAscan-SE 2.0.12
- hmmer v.3.4
- infernal 1.1.5
- CRISPR v.1.03
- AntiFam v.3.0
- Rfam v.14.4
- GeneMarkS-2 v.1.14_1.25
2024-04-27.build7426
This release is based on PGAP-6.7.
- Updated software:
- hmmer v.3.4
- infernal 1.1.5
- Protein Family Models now includes PFam 36
- Incorporation of GeneOntology 2024-01-17
- Support for Apple Silicon using Docker Desktop / Rosetta
- Revised report generation for ANI Taxonomy checks, including significant overhaul of how ANI contamination is computed and reported
Third Party software versions
- tRNAscan-SE 2.0.12
- hmmer v.3.4
- infernal 1.1.5
- CRISPR v.1.03
- AntiFam v.3.0
- Rfam v.14.4
- GeneMarkS-2 v.1.14_1.25
2023-10-03.build7061
This release is based on PGAP-6.6. It includes the following features and bug fixes:
- Lowered pseudogene false positive rate by improving protein alignment handling during structural annotation
- Designed new hidden Markov models (HMMs) for validated small proteins, for improving structural annotation
- Adopted PFAM release 35, for help in structural and functional annotation
- Bug fixes:
- Podman container name is now respected, #270
- Empty undocumented files now deleted from output
- Removed extra # in annot_with_genomic_fasta.gff
pgap.py
-g option now supports absolute paths- If output directory exists, exit rather than add a suffix
Third Party Software and Data versions used:
- GeneOntology 2023-07-27
- tRNAscan-SE 2.0.12
- hmmer v.3.1b2
- CRISPR v.1.02
- AntiFam v.3.0
- Rfam v.14.4
- GeneMarkS-2 v.1.14_1.25
- infernal v.1.1.4
2023-05-17.build6771
This release is based on PGAP-6.5. It includes the following features and bug fixes:
- Addition of attributes (Gene Ontology terms, EC numbers, gene symbols) to more protein-coding features, by propagation from curated conserved domain architectures (CDD architectures)
- Incremental improvements in structural algorithm
- Addition of simple option for passing fasta and organism to
pgap.py
via parameters instead of a yaml file. This option is not sufficient and should NOT be used if the annotated assembly is intended for submission to GenBank. - Improved help text for
pgap.py
- Bug fixes:
- for small circular plasmids, elimination of crashes when alignment covers almost all of the sequence of the plasmid
- consistency of partiality and pseudo-status of features between cdregions and genes
- fixed CPU handling in SLURM environment
- use of fasta file name instead of 'gc_assm_name' as assembly name in ani-tax-report files
Third Party Software and Data versions used (no changes since last release):
- GeneOntology 2023-01-01
- tRNAscan-SE 2.0.12
- hmmer v.3.1b2
- CRISPR v.1.02
- AntiFam v.3.0
- Rfam v.14.4
- GeneMarkS2 v.1.14_1.25
- infernal v.1.1.1
2022-12-13.build6494
This release is based on PGAP-6.4. It includes the following features and bug fixes
New features:
- More stringent filtering of alignments of trusted proteins, resulting in improvements in the structural annotation of long proteins
- New outputs: nucleotide and protein sequences of CDS features and enhanced Roary-ready GFF output
- Upgrade to tRNAscan-SE 2.0.12
- Changes in the reference data:
- Incorporation of GeneOntology 2022-11-03 changes
- Switch to CDD 3.20 architectures
Bug fixes
- Serious publication retrieval bug introduced by changes in third party service during the lifetime of the previous build fixed
Third Party Software Versions Used
- tRNAscan-SE 2.0.12
- hmmer v.3.1b2
- CRISPR v.1.02
- AntiFam v.3.0
- Rfam v.14.4
- GeneMarkS-2 v.1.14_1.25
2022-10-03.build6384
This release is based on PGAP-6.3. It includes the following features and bug fixes:
- Added more stringent filtering of low coverage and complexity protein alignments, resulting in better annotation of long protein models
- Incorporated CheckM (Parks, Donovan H et al. Genome research vol. 25,7 (2015): 1043-55) for calculating the completeness and contamination of the assembly based on the presence/absence of lineage-specific markers in the set of PGAP-predicted models
- Added import traceback #224
- Bug fix: switched to newer docker hub repository version v2
- Bug fix: better handling of the "isolate" FASTA modifier.
Third Party Software Versions Used
No changes since the previous release.
- tRNAScan-SE v.2.0.9
- hmmer v.3.1b2
- CRISPR v.1.02
- AntiFam v.3.0
- Rfam v.14.4
- GeneMarkS-2 v.1.14_1.25
- CheckM v.1.2.1
2022-08-11.build6275
Used: PGAP 6.2
This release is based on PGAP-6.2. It includes the following features and bug fixes:
- Update to the structural annotation algorithm: increased trust in HMM alignments resulting in better choice of start sites
- Lowering of the acceptance criteria for ab initio hypothetical ORFs from 45 aa to 40 aa
- Update tRNAScan-SE from v.2.0.7 to 2.0.9
- Fixed handling some FASTA modifiers github issue 210
- Support apptainer - new singularity
- Fixed support of home directory installation for Windows users
- Updated align_filter usage in CWL
Third Party Software Versions Used
- tRNAScan-SE v.2.0.9
- hmmer v.3.1b2
- CRISPR v.1.02
- AntiFam v.3.0
- Rfam v.14.4
- GeneMarkS-2 v.1.14_1.25
2022-04-14.build6021
This release is based on PGAP-6.1. It includes the following improvements and bug fixes:
- Improvement: faster installation achieved with parallel download and decompression of PGAP and taxcheck data packages
- Improvement: PGAP can now be installed at a configurable location, different from the home directory. By default it will install in
$HOME/.pgap
, but this location can be changed by setting the environmental variablePGAP_INPUT_DIR
. - Bug fix: assemblies for organisms without a genus in their lineage can now be annotated.
- Bug fix: incomplete installation caused by race condition in directory creation fixed
- Bug fix: mapping of gene symbols by orthology to genes in reference genomes is now correct. Fixes the assignments of gene symbols (e.g. recA) to features in the annotations of: Acinetobacter pittii, Bacillus subtilis, Campylobacter jejuni, Escherichia coli and Mycobacterium tuberculosis genomes. Annotation of other species is unaffected.
2022-02-10.build5872
This release is based on PGAP-6.0. It includes the following features and bug fixes:
- Gene Ontology terms are now added to CDSs and proteins, when known. Like EC numbers, these are propagated from HMMs and BlastRules used to name the proteins.
- Incorporated 17 RFAM models for the annotation of more riboswitches
- Introduced the
--auto-correct-tax
flag inpgap.py
, to override the organism provided in the input YAML file, if the taxcheck predicts a different organism with high confidence. Use in combination with the--taxcheck
flag - Introduced a minimum coverage threshold of 20% to taxcheck - if the query assembly doesn't match any type assembly over 20%, taxcheck will return inconclusive results (not predict an organism)
- Added support for Debian 10
- Bug fix: assemblies for organisms without a genus in their lineage can now be annotated.
- Bug fix: running PGAP with Singularity without internet access (
--no-internet
) is now possible. Users need to pointpgap.py
to a local SIF image (converted from Docker) using the--container-path
argument.
2021-11-29.build5742
This release is based on pgap-5.3. It includes the following features and bug fixes:
- Updated the structural annotation algorithm to facilitate future extensibility. This change results in improvements in structural annotation, driven by higher weight of GeneMarkS2+ ab initio models at loci where only weak evidence are found (such as low identity and coverage protein alignments or partial HMM hits).
- Switched to Linux kernel 3 compilation of GeneMarkS2+
- Upgraded PFAM models to PFAM 34
- Adjusted the minimum percent identity thresholds used by the Average Nucleotide Identity tool for several species, including Listeria monocytogenes, Campylobacter lari, and Vibrio vulnificus.
- Improved reporting of errors in input YAML files