Releases: MelbourneGenomics/cpipe
Bugfix 2.5.1
Bugfix for a bug whereby certain Python dependencies caused an installation deadlock. Solved this by specifying the version of each python package manually.
Release 2.5
Changes in Cpipe 2.5.0
Command Line Interface Rework
cpipe
commands have been completely rewritten in Python to have much more functionality and to be much more user friendly. Features have been added to existing commands, like the ability to create a batch using cpipe batch create --data *.fastq
, and the ability to run the pipeline with bpipe options , using cpipe run MyBatch -n 3
New commands include:
cpipe stop
, which stops a currently running batchcpipe batch check
, which validates a batch metadata file against a comprehensive schemacpipe batch edit
andcpipe batch view
, which start an interactive editor for editing/viewing the sometimes complicated batch metadata files.
For more information, refer to the command documentation.
Environment Shell
It is also now possible to start a bash shell with its environment set in such a way that it is able to use Cpipe libraries and binaries on its $PATH
, including tools such as samtools
, bedtools
, vep
, bpipe
etc. This can be useful for debugging Cpipe runs, for running extra Cpipe utility functions, or even for using Cpipe tools for other purposes. For more information, refer to the documentation.
New VEP Script
Ensembl VEP, which is used by Cpipe for variant annotation, has been updated to the new reworked version, which should be much faster and more accurate. The VEP cache has also been pre-indexed, which should also add to speed improvements.
Public Install
It is now possible to run a full automated install, without needing the MGHA credentials. This means that using Cpipe is again very easy for any member of the public. All you need to do manually is find a copy of GATK.
Docker Registry
MGHA users can now install Cpipe using Docker, by pulling from our Docker registry. Refer to the Docker registry documentation here for more information.
Interactive Bpipe Config Generation
The installation script will now prompt you for your queueing system, and give you some configuration options, which should make installation much simpler and less error prone.
Miscellaneous Changes
- Pipeline IDs are now automatically generated
- Large documentation changes
- Fix 1 less sample being reported by the pipeline than was actually found
- Support for nested batches (a batch inside a directory inside
batches
) - Fix potential infinite loop if the user manually
source
d theenvironment.sh
file - Fixed issues with the Torque executor
- Python code was restructured into a
distutils
package, meaning that code is now properly shared between scripts, allowing more maintainable code - Use updated metadata format, which has two optional extra columns
- Updated tools:
- Python to Python 3.6
- Bpipe to 0.9.9.4
- Vep plugins updated
- Pip packages updated
- Various other bug fixes
Additional annotation from ExAC and 1000G
Due to issues with VEP's annotation of ExAC and 1000G datasets, Cpipe now adds these annotations separately, using a normalised version of the ExAC (0.3.1) and 1000G (phase 3) VCFs.
- These VCFs have been added to the set of Cpipe assets (in data/annotation)
- vcfanno has been added as a stage to the pipeline.
- The vcfanno stage by default adds annotation for ExAC and 1000G, but can be customised by modifying pipeline/vcfanno.config.
- Redundant fields from VEP and dbNSFP have been removed.
Release 2.4
New in Cpipe 2.4
Installer
- Created new installer which manages a full cpipe install from scratch in as little as 30 minutes by downloading assets stored in a Swift object store (NeCTAR)
- The installer bundles most runtimes like Python, Perl, Groovy etc., ensuring Cpipe will run identically on different machines
- This also means that Cpipe requires less software to be installed on the host machine
- This process is managed by an install script in the root directory (
./install.sh
) - This process is documented in the install documentation
Run Script
- Cpipe now has a central run script (
./cpipe
) which can be used to run most common cpipe commands, includingrun
,test
, manipulate batches, designs and metadata files. - The script is documented in the commands documentation
Documentation
- Comprehensive documentation is now included in Cpipe, which is visible either in your local copies or on GitHub
- The documentation covers installing, running, configuring and various other aspects to operating Cpipe
- Have a look at the documentation index here
Docker support
- Cpipe is now able to run in a docker container
- Just clone the repo, copy in the swift_credentials.sh file (refer to the documentation for an explanation), and run
docker build .
and you can generate a Cpipe docker container
ALL Design
- In line with new recommendations, we now encourage you to use the built-in ALL design instead of creating a gene list for your own analysis
- This design will cover all official UCSC genes instead of limiting the analysis. This will save having to re-analyse the data in case your gene list expands
Configuration Changes
- The bpipe.config file has moved to bpipe.config.template, which is copied to bpipe.config when your run the install. This prevents your bpipe.config settings from being overwritten when you pull never versions of Cpipe.
- The bpipe.config has been rewritten to include a template for use with an HPC queue system (e.g. torque). This means you only need to remove one comment and specify your queue system and Cpipe should work with your system
- Command names have been given to each command in the pipeline, meaning that their settings (wall time, RAM etc.) can be configured in bpipe.config using the relevant name. For example, "vep" for the VEP command.
- The
samples.txt
file previously located in each batch directory has been renamed toconfig.batch.groovy
to reflect that it can be used to change any Cpipe configuration settings for the batch - New
JAVA_OPTS
variable in config.groovy/config.batch.groovy for setting flags that should always be used for Java, for instance-noverify
for certain versions of Java that dislike Groovy bytecode.
Miscellaneous Changes
- The Cpipe directory has been restructured to place all reference data in
data
, all tools and runtimes intools
, and all installation tasks intasks
- Some un-needed files have been removed from Cpipe
Support
2.4 will receive continued support on the 2.4 branch of Cpipe. Unlike the dev branch, which will often contain broken developments versions of Cpipe, if you remain on the 2.4 branch and pull periodically you will keep up-to-date but without any breaking changes.
Filter superfluous XM tags
When annotating all transcripts with VEP, many additional XM tags are annotated. This release filters the final output so that XM tags are only retained if there is not an NM tag present for that variant.
Annotate all transcripts
- Annotate all transcripts with VEP
- Analyze all samples by default unless "exclude" is specified.
Padding and gap annotation updates
Changes to this release:
- Padding is now added by Cpipe, and is a customisable parameter. Input bed files do not need to be padded.
- Gap annotator has been updated to have a more convenient column ordering and more intuitive column names.
CentOS bgfz workaround
This change is specifically to deal with the CentOS issue related to bgfz (PIF-5).
Trio analysis
- Trio Analysis: trio analysis for the case of a proband child and unaffected parents has been implemented.
- Modularisation: the pipeline has been restructured to be modular so that other types of analyses can be added to the system.
- Gap file annotation: more complete and extensible gap annotation functionality has been implemented.
- Tool Updates: all 3rd party tools have been updated to the latest versions.
- Migration to VEP and removal of Annovar dependency: Annovar is no longer part of the analysis. All annotation is performed by VEP, along with VEP plugins.
- LOVD+ Compatibility: The final output is a tab separated file that complies with LOVD+ requirements.
Support for both BED file and gene list inputs: Analysis profiles can be specified with either regions (BED) or gene lists. - Improved test framework: The testability and robustness of the code has been improved.
- Additional configuration options:
- ANNOTATE_CUSTOM_REGIONS="": bed file to add additional annotation to the final output.
- FILTERED_ON_EXONS="skip": what kind of final bam file to generate (skip, design, or exons)
- GAP_ANNOTATOR_CUSTOM_BEDS="": space separated list of additonal bed files to consult when generating the gap annotation file.
- GATK_VARIANT_ONLY=false: do not do genotyping, directly call the variants.
- POST_ANALYSIS_READ_ONLY=true: mark the batch read only on completion.
- POST_ANALYSIS_MOVE=true: move the batch directory on completion.
- HARD_FILTER_AD=2: minimum allele depth
- HARD_FILTER_AF=0.15: minimum allele frequency
- HARD_FILTER_DP=5: minimum depth
- HARD_FILTER_QUAL=5: minimum quality
- QC_THRESHOLD=20: what depth is required to contribute to satisfactory coverage
- QC_GOOD=95: what percentage of QC_THRESHOLD must be achieved across the gene to get a good rating
- QC_PASS=80: what percentage of QC_THRESHOLD must be achieved across the gene to get a pass rating
- QC_FAIL=0: what percentage of QC_THRESHOLD must be achieved across the gene to get a fail rating
2.2.0 Public Version
This is the MGHA 2.2.0 release ported to the public Cpipe repository.
Please see the Cpipe 2.2.0 release notes for full details of changes in that release.