FastqWiper

FastqWiper is a Python application that wipes out badly formatted reads from readable FASTQ files.

More complex workflows, as recover corrupted fastq.gz, dropping or fixing pesky lines, removing unpaired reads, and fixing reads interleaving, can be executed using Snakemake and the preconfigured pipeline files provided here.

Compatibility: Python <3.9
OS: Windows (excluding pipelines), Linux, Mac OS
Contributions: [email protected]
Pypi: https://pypi.org/project/fastqwiper
Conda: https://anaconda.org/bfxcss/fastqwiper
Docker Hub: available soon
Bug report: https://github.com/mazzalab/fastqwiper/issues

Installation

FastqWiper alone can be installed using both Conda and PyPi and runs smoothly on all OS specified above.

Anaconda or Miniconda

Create and activate an empty Conda environment, if not already available.

$ conda create -n FastqWiper python=3.8
$ conda activate FastqWiper

then
$ conda install -y -c bfxcss -c conda-forge fastqwiper

Pypi

pip install fastqwiper

Usage

fastqwiper <options>

options:
  --fastq_in TEXT          The input FASTQ file to be cleaned  [required]
  --fastq_out TEXT         The wiped FASTQ file                [required]
  --log_frequency INTEGER  The number of processed reads that you want to print a status message after

It accepts in input and outputs readable *.fastq or *.fastq.gz files.

Snakemake

To enable the use of preconfigured pipelines, you need to install Snakemake. The recommended way to install Snakemake is via Conda, because it enables Snakemake to handle software dependencies of your workflow. However, the default conda solver is slow and often hangs. Therefore, we recommend installing Mamba as a drop-in replacement via

$ conda install -c conda-forge mamba

and then creating and activating a clean environment as above:

$ mamba create -c conda-forge -c bioconda -n FastqWiper snakemake
$ conda activate FastqWiper
$ conda install colorama click
$ conda install mamba -c conda-forge

Usage

Clone the FastqWiper repository:

git clone https://github.com/mazzalab/fastqwiper.git.

It contains, in particular, a folder data containing the fastq files to be processed, a folder pipeline containing the released pipelines and a folder fastq_wiper with the source files of FastqWiper.
Input files to be processed should be copied into the data folder. All software packages not fetched from Conda and used by the pipelines should be copied, even if it is not strictly mandatory, in the root directory of the cloned repository.

Currently, to run the FastqWiper pipelines, the following packages are not included in Conda but are required:

required packages:

gzrt (install instructions)

BBTools (install instructions)

$ cd fastqwiper
$ git clone https://github.com/arenn/gzrt.git
$ cd gzrt
$ make
$ cd ..
$ cd fastqwiper
$ tar -xvzf BBMap_(version).tar.gz

Commands:

Paired-end files

Personalize a pipeline. Using fix_wipe_pairs_reads.smk requires you to edit line 3 of the file with the name of the fastq files stored in data folder that you want to process. If the files were:

excerpt_S1_R1_001.fastq.gz
excerpt_S1_R2_001.fastq.gz
sample_S1_R1_001.fastq.gz
sample_S1_R2_001.fastq.gz

the SAMPLE vector should be: SAMPLES = ["sample", "excerpt"]

Get a dry run of a pipeline (e.g., fix_wipe_pairs_reads.smk):
snakemake -s pipeline/fix_wipe_pairs_reads.smk --use-conda --cores 2 -np
Generate the planned DAG:
snakemake -s pipeline/fix_wipe_pairs_reads.smk --dag | dot -Tpdf > dag.pdf

Run the pipeline (n.b., during the first execution, Snakemake will download and install some required remote packages and may take longer). The number of computing cores can be tuned accordingly:
snakemake -s pipeline/fix_wipe_pairs_reads.smk --use-conda --cores 2

Fixed files will be copied in the data folder and will be suffixed with the string _fixed_wiped_paired_interleaving. We remind that the fix_wipe_pairs_reads.smk pipeline performs the following actions:

execute gzrt on corrupted fastq.gz files (i.e., that cannot be unzipped because of errors) and recover readable reads;
execute fastqwiper on recovered reads to make them compliant with the FASTQ format (source: Wipipedia)
execute Trimmomatic on wiped reads to remove residual unpaired reads
execute BBmap (repair.sh) on paired reads to fix the correct interleaving and sort fastq files.

Single-end files

Using fix_wipe_pairs_reads.smk requires you to make the same edits as above. This pipeline will not execute trimmomatic and BBmap's repair.sh.

Get a dry run of a pipeline (e.g., fix_wipe_single_reads.smk):
snakemake -s pipeline/fix_wipe_single_reads.smk --use-conda --cores 2 -np
Generate the planned DAG:
snakemake -s pipeline/fix_wipe_single_reads.smk --dag | dot -Tpdf > dag.pdf

Run the pipeline (n.b., during the first execution, Snakemake will download and install some required remote packages and may take longer). The number of computing cores can be tuned accordingly:
snakemake -s pipeline/fix_wipe_single_reads.smk --use-conda --cores 2

Author

Tommaso Mazza

Laboratory of Bioinformatics
Fondazione IRCCS Casa Sollievo della Sofferenza
Viale Regina Margherita 261 - 00198 Roma IT
Tel: +39 06 44160526 - Fax: +39 06 44160548
E-mail: [email protected]
Web page: http://www.css-mendel.it
Web page: http://bioinformatics.css-mendel.it

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
conda-recipe		conda-recipe
data		data
fastq_wiper		fastq_wiper
pipeline		pipeline
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
appveyor.yml		appveyor.yml
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastqWiper

Installation

Anaconda or Miniconda

Pypi

Usage

Snakemake

Usage

required packages:

Commands:

Paired-end files

Single-end files

Author

About

Releases

Packages

Languages

License

bioinformatics-lab/fastqwiper

Folders and files

Latest commit

History

Repository files navigation

FastqWiper

Installation

Anaconda or Miniconda

Pypi

Usage

Snakemake

Usage

required packages:

Commands:

Paired-end files

Single-end files

Author

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages