Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge new wiki docs with qiime 2023.5 #107

Merged
merged 9 commits into from
Aug 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions wiki/qiime2-2023.2/1-Background.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
## Methods

Tourmaline builds on the latest methods for analysis of microbial and eDNA amplicon sequence data. This section describes those methods and provides tutorials for some of them.

### Amplicon sequence variants

Amplicon sequencing (metabarcoding) is a method whereby a single DNA locus in a community of organisms is PCR-amplified and sequenced. Two methods of amplicon sequence processing are supported, both of which generate ASVs (amplicon sequence variants), which approximate the "true" or "exact" sequences in a sample, rather than OTUs (operational taxonomic units), which blur sequencing errors and microdiversity through clustering:

* [DADA2](https://github.com/benjjneb/dada2) implements a quality-aware model of Illumina amplicon errors to infer sample composition by dividing amplicon reads into partitions consistent with the error model ([Callahan et al., 2016](https://doi.org/10.1038/nmeth.3869)).
* [Deblur](https://github.com/biocore/deblur) is a greedy deconvolution algorithm based on known Illumina read error profiles ([Amir et al., 2017](https://doi.org/10.1128/mSystems.00191-16)).

### QIIME 2

[QIIME 2](https://qiime2.org/) ([Bolyen et al., 2019](https://doi.org/10.1038/s41587-019-0209-9)) is one of the most popular amplicon sequence analysis software tools available. It supports both DADA2 and Deblur denoising algorithms as well as a variety of downstream diversity and statistical analyses and visualizations. [Click here for a tutorial on QIIME 2.](https://github.com/aomlomics/tutorials/tree/master/qiime2)

### Snakemake

[Snakemake](https://snakemake.readthedocs.io/en/stable/) is a workflow management software that allows for reproducible and scalable workflows for bioinformatics and other data analyses. It keeps track of input and output files, storing output files in a logical directory structure. It uses rules to define commands and only runs rules when they are required to produce the desired output. [Click here for a tutorial on Snakemake.](https://github.com/aomlomics/tutorials/tree/master/snakemake)

## Assessing your data

Before starting any bioinformatics workflow, it is important to assess your data and metadata to decide how they need to be formatted and inform your parameter choices. The questions below can help determine the best parameters for processing and to be able to evaluate the success of your completed workflow.

### Amplicon locus

* What is the locus being amplified, and what are the primer sequences?
* How much sequence variation is expected for this locus (and primer sites) and dataset?
* Is the expected sequence variation enough to answer my question?
* What is the expected amplicon size for this locus and dataset?

### Sequence data

* What type and length of sequencing was used? (e.g., MiSeq 2x150bp)
* Were all my samples sequenced in the same sequencing run? (Rule `check_illumina_run` will check for this.)
* Do I have long enough sequening to do paired-end analysis, or do I have to do single-end analysis only?
* Has the relevant sequence pre-processing been done already: Demultiplexing? Quality filtering and/or trimming? Primer removal? FastQC/MultiQC profiling before and/or after filtering/trimming? (Note on quality trimming: If you plan on using DADA2 for denoising, the developers recommend no quality trimming be done before running, as it confuses the error profiling algorithm of DADA2. Quality filtering including complete removal of erroneous sequences is still advisable.)

### Sample set and metadata

* Is my metadata file complete? Are the relevant parameters of my dataset present as numeric or categorical variables?
* Do I have enough samples in each group of key metadata categories to determine an effect?

## Workflow overview

The table below describes the basic steps of the Tourmaline workflow. Further instructions are provided in the sections [Setup](https://github.com/aomlomics/tourmaline/wiki/3-Setup) and [Run](https://github.com/aomlomics/tourmaline/wiki/4-Run).

In the file paths below, `{method}` is one of:

* `dada2-pe` (paired-end DADA2)
* `dada2-se` (single-end DADA2)
* `deblur-se` (single-end Deblur)

and `{filter}` is one of:

* `unfiltered` (representative sequences and feature table *are not* filtered by taxonomy or feature ID)
* `filtered` (representative sequences and feature table *are* filtered by taxonomy or feature ID)

| Step | Command | Output |
| ----------------------------------- | ------------------------------ | ------------------------------------------------------------ |
| Format input and configuration file | (ad hoc) | `config.yaml`, `Snakefile`, `00-data/metadata.tsv`, `00-data/manifest_se.tsv`, `00-data/manifest_pe.tsv`, `00-data/refseqs.fasta` or `01-imported/refseqs.qza`, `00-data/reftax.tsv` or `01-imported/reftax.qza` |
| Import data | `snakemake {method}_denoise` | `01-imported/` (multiple files) |
| Denoising | `snakemake {method}_denoise` | `02-output-{method}/00-table-repseqs/` (multiple files) |
| Taxonomic assignment | `snakemake {method}_taxonomy_{filter}` | `02-output-{method}/01-taxonomy` (multiple files) |
| Representative sequence curation | `snakemake {method}_diversity_{filter}` | `02-output-{method}/02-alignment-tree` (multiple files) |
| Core diversity analyses | `snakemake {method}_diversity_{filter}` | `02-output-{method}-{filter}/03-alpha-diversity/` `02-output-{method}-{filter}/04-beta-diversity/` (multiple files) |
| Report | `snakemake {method}_report_{filter}` | `03-reports/report_{method}_{filter}.html` |

## Contact us

* Questions? Join the conversation on [gitter](https://gitter.im/aomlomics/tourmaline).
* Have a feature request? Raise an issue on [GitHub](https://github.com/aomlomics/tourmaline/issues).


143 changes: 143 additions & 0 deletions wiki/qiime2-2023.2/2-Install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
## Dependencies

Tourmaline requires the following software:

* Conda
* QIIME 2 version 2023.2
* QIIME 2 plugins: deicode, empress
* Snakemake
* Python packages: biopython, tabulate
* R packages: msa, odseq
* Multiple sequence alignment tools: clustalo, muscle v5
* Other command-line tools: pandoc

## Installation options

### Option 1: Native installation

The native installation builds on the Conda installation of QIIME 2.

#### Conda

First, if you don't have Conda installed on your machine, install [Miniconda](https://conda.io/miniconda.html) for your operating system (Python 3.8+ version).

#### QIIME 2

Second, install QIIME 2 in a Conda environment, if you haven't already. See the instructions at [qiime2.org](https://docs.qiime2.org/2023.2/install/native/). For example, on macOS these commands will install QIIME 2 inside a Conda environment called `qiime2-2023.2` (for Linux, change "osx" to "linux"):

```
wget https://data.qiime2.org/distro/core/qiime2-2023.2-py38-osx-conda.yml
conda env create -n qiime2-2023.2 --file qiime2-2023.2-py38-osx-conda.yml
```

#### Snakemake and other dependencies

Third, activate your QIIME 2 environment and install Snakemake and other dependencies:

```
conda activate qiime2-2023.2
conda install -c conda-forge -c bioconda snakemake biopython muscle clustalo tabulate
conda install -c conda-forge deicode
pip install empress
qiime dev refresh-cache
conda install -c bioconda bioconductor-msa bioconductor-odseq
```

### Option 2: Docker container

An alternative to the native installation is a Docker container. Here are the steps to follow:

#### Docker Desktop

Make sure Docker is installed on your system. If you are using a laptop or desktop machine, you can [install](https://docs.docker.com/get-docker/) Docker Desktop (Mac or Windows) or Docker (Linux). If you are on a compute cluster, you may need to contact your system administrator. This command will list your Docker images and make sure the Docker daemon is running:

```
docker images
```

Make sure Docker has enough memory. On Docker for Mac, the default memory is 2 GB. Go to Preferences -> Resources -> Advanced -> Memory and increase the maximum memory to 8 GB or more if possible.

#### Docker image

Download the [Tourmaline Docker image](https://hub.docker.com/repository/docker/aomlomics/tourmaline) from DockerHub:

```
docker pull aomlomics/tourmaline
```

List your Docker images again to make sure the `tourmaline` image is there:

```
docker images
```

#### Docker container

Now create and run a container:

```
docker run -v $HOME:/data -it aomlomics/tourmaline
```

If installing on a Mac with an Apple M1 chip, run the Docker image with the `--platform linux/amd64` command. It will take a few minutes for the image to load the first time it is run.

```
docker run --platform linux/amd64 -v $HOME:/data -it aomlomics/tourmaline
```

#### External files

The `-v` (volume) flag above allows you to mount a local file system volume (in this case your home directory) to read/write from your container. Note that symbolic links in a mounted volume will not work.

Use mounted volumes to:

* copy metadata and manifest files to your container;
* create symbolic links from your container to your FASTQ files and reference database;
* copy your whole Tourmaline directory out of the container when the run is completed (alternatively, you can clone the Tourmaline directory inside the mounted volume).

To access files that aren't on a mounted volume, from outside the container (running or not), use these commands:

```
docker cp container:source_path destination_path # container to file system
docker cp source_path container:destination_path # file system to container
```

#### Common errors

If you get an error "Plugin error from feature-classifier: Command ... died with <Signals.SIGKILL: 9>.", this means your Docker container ran out of memory. Go to Preferences -> Resources -> Advanced -> Memory and increase the maximum memory to 8 GB or more if possible.

#### Restart a container

You can stop a container by typing `exit`. The container will still be running.

See all your containers with this command:

```
docker ps -a
```

Start a container using its container ID or name with this command:

```
docker start -ia CONTAINER
```

#### Remove containers and images

Remove a Docker container:

```
docker container rm CONTAINER
```

Remove a Docker image:

```
docker rmi IMAGE
```

Remove all stopped containers, all networks not used by at least one container, all images without at least one container associated to them, and all build cache:

```
docker system prune -a
```
Loading