Skip to content

anwarMZ/CoMW

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CoMW

Comparative Metatranscriptomics Workflow is a standardized and validated workflow to functionally classify quality filtered mRNA reads from metatranscriptomic or total RNA studies generated using NGS short reads. CoMW is used for classification of these reads using assembled contigs to the reference databases provided and cited.

If you use CoMW in your research, please cite:

Muhammad Zohaib Anwar, Anders Lanzen, Toke Bang-Andreasen, Carsten Suhr Jacobsen, To assemble or not to resemble—A validated Comparative Metatranscriptomics Workflow (CoMW), GigaScience, Volume 8, Issue 8, August 2019, giz096, https://doi.org/10.1093/gigascience/giz096

For queries or issues please contact : [email protected]

System Requirement, Structure and Installation

  1. SSE4.1 or higher
.
|-- CoMW_Manual.pdf
|-- CoMW.yml
|-- databases
|-- install.sh
|-- Readme.md
|-- scripts
|   |-- align_contigs_to_database.py
|   |-- annotate_count_table.py
|   |-- assemble_reads.py
|   |-- filter_ncRNA.py
|   |-- filter_table_by_abundance.py
|   |-- map_orthologs_to_count_table.py
|   |-- map_reads_to_contigs.py
|   `-- parse_sword.py
`-- utils
    |-- AggregateTables.R
    |-- Filteration.R
    |-- MapReads_to_contigs.sh
    |-- parsecm.py
    `-- ParsingSword.R
  1. Download the latest distribution and expand it using unzip

    OR

  2. Download the development version:

git clone https://github.com/anwarMZ/CoMW.git

Create an environment using anaconda, If you do not have anaconda installed, use Anaconda installer link

cd CoMW
conda env create -f ./CoMW.yml
source activate CoMW

Run install.sh file to download databases to be used in CoMW in databases directory

bash ./install.sh

Now run python scripts e.g.

python scripts/assemble_reads.py -h 

Scripts

These scripts are written in Python and detailed parameters and dependdencies are along with usage examples are given in CoMW user manual CoMW_Manual.pdf

1. assemble_reads.py
2. filter_ncRNAs.py
3. map_reads_to_contigs.py
4. filter_table_by_abundance.py
5. align_contigs_to_database.py
6. parse_sword.py
7. map_orthologs_to_count_table.py
8. annotate_count_table.py

Utils

Utils are small snippets written in R or bash to assist main scripts. They must be present in utils folder in order to be accessible

Databases

Databases include fasta files and annotations that are also available in their respective developer websites but are also collected here under same licence Depending upon your usage, please cite the efforts of these databases that were developed by these groups.

  1. The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools Wilke, A., Harrison, T., Wilkening, J., Field, D., Glass, E.M., Kyrpides, N., Mavrommatis, K. and Meyer, F., 2012. The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools. BMC bioinformatics, 13(1), p.141.

  2. Carbohydrate Active Enzymes database) and URL (http://www.cazy.org/) and cite : Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P.M. and Henrissat, B., 2013. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic acids research, 42(D1), pp.D490-D495.

  3. NCycDB: a curated integrative database for fast and accurate metagenomic profiling of nitrogen cycling genes Tu, Q., Lin, L., Cheng, L., Deng, Y., He, Z. and Wren, J., 2018. NCycDB: a curated integrative database for fast and accurate metagenomic profiling of nitrogen cycling genes. Bioinformatics, 1, p.9.

Acknowledgment

This work was supported by a grant from the European Commissions Marie Sklowdowska Curie Actions program MicroArctic-ITN under project number 675546.