Skip to content
This repository has been archived by the owner on Sep 22, 2023. It is now read-only.

Latest commit

 

History

History
311 lines (231 loc) · 12.1 KB

README.md

File metadata and controls

311 lines (231 loc) · 12.1 KB

MGCplotter: Microbial Genome Circular plotter

Python3 OS License Latest PyPI version Bioconda
CI codecov

Table of Contents

Overview

MGCplotter is easy-to-use command line tool for plotting microbial genome in circular layout using Circos. MGCplotter requires Genbank format genome file and implements following 3 main functions for plotting figure.

  1. Plot Basic Features of Microbial Genome
    Basic Features mean Forward/Reverse CDS, rRNA, tRNA, GC content, GC skew.
    MGCplotter can control plot result of feature's color/size/visibility by command options.

  2. Assign & Plot COG Functional Classification
    Assign COG functional classification to reference genome CDS using COGclassifier. COG functional classification colors are used in plot result of forward/reverse CDS.

    List of COG Functional Classification Color

    COG_definition_fig

  3. Search & Plot Conserved CDS between reference and query species
    Conserved CDS of query genome relative to reference genome is searched by MMseqs2 RBH method. Each query conserved CDS is plotted with gradient color based on identity of RBH result.

MGCplotter_example_fig
Fig.1: Plot result of Mycoplasma Gallisepticum genome
Outer to inner tracks mean (1) Forward CDS (2) Reverse CDS (3) rRNA (4) tRNA (5) GC content (6) GC skew, respectively. COG functional classification color is assigned to Forward/Reverse CDS.

MGCplotter_example_fig
Fig.2: Add conserved CDS tracks of 3 query species to Fig.1
Conserved CDS of query genomes relative to reference genome is shown.

Installation

MGCplotter is implemented in Python3.

Install bioconda package:

conda install -c conda-forge -c bioconda mgcplotter

Install PyPI pakcage:

pip install mgcplotter

Use Docker (Docker Image):

docker pull moshi4/mgcplotter:latest
docker run moshi4/mgcplotter:latest MGCplotter -h

Dependencies

  • Circos
    Software package for visualizing data and information in circular layout
  • COGclassifier
    A tool for classifying prokaryote protein sequences into COG functional category
  • MMseqs2
    Ultra fast and sensitive sequence search and clustering suite

Usage

Basic Command

MGCplotter -r [genome genbank file] -o [output directory] --assign_cog_color

Options

General Options:
  -r R, --ref_file R      Reference genome genbank file (*.gb|*.gbk|*.gbff)
  -o O, --outdir O        Output directory
  --query_files  [ ...]   Query CDS fasta or genome genbank files (*.fa|*.faa|*.fasta|*.gb|*.gbk|*.gbff)
  --cog_evalue            COGclassifier e-value parameter (Default: 1e-02)
  --mmseqs_evalue         MMseqs RBH search e-value parameter (Default: 1e-03)
  -t , --thread_num       Threads number parameter (Default: MaxThread - 1)
  -f, --force             Forcibly overwrite previous calculation result (Default: OFF)
  -v, --version           Print version information
  -h, --help              Show this help message and exit

Graph Size Options:
  --ticks_labelsize       Ticks label size (Default: 35)
  --forward_cds_r         Forward CDS track radius size (Default: 0.07)
  --reverse_cds_r         Reverse CDS track radius size (Default: 0.07)
  --rrna_r                rRNA track radius size (Default: 0.07)
  --trna_r                tRNA track radius size (Default: 0.07)
  --conserved_cds_r       Conserved CDS track radius size (Default: 0.04)
  --gc_content_r          GC content track radius size (Default: 0.15)
  --gc_skew_r             GC skew track radius size (Default: 0.15)

Graph Color Options:
  --assign_cog_color      Assign COG classification color to reference CDSs (Default: OFF)
  --cog_color_json        User-defined COG classification color json file
  --forward_cds_color     Forward CDS color (Default: 'red')
  --reverse_cds_color     Reverse CDS color (Default: 'blue')
  --rrna_color            rRNA color (Default: 'green')
  --trna_color            tRNA color (Default: 'magenta')
  --conserved_cds_color   Conserved CDS color (Default: 'chocolate')
  --gc_content_p_color    GC content color for positive value from average (Default: 'black')
  --gc_content_n_color    GC content color for negative value from average (Default: 'grey')
  --gc_skew_p_color       GC skew color for positive value (Default: 'olive')
  --gc_skew_n_color       GC skew color for negative value (Default: 'purple')

For graph color options, user can use matplotlib named color (e.g. 'red') or hexcolor code (e.g. '#ff0000').

Matplotlib named color list

Matplotlib named color list

Example Command

1. M.Gallisepticum genome simple plot (= Fig.1)

Reference: Mgallisepticum.gbff (0.63 MB)

MGCplotter -r Mgallisepticum.gbff -o ./example_result01 --assign_cog_color

2. M.Gallisepticum genome plot with 3 query conserved CDS (= Fig.2)

Reference: Mgallisepticum.gbff (0.63 MB), Query: example02 (2.0 MB)

MGCplotter -r Mgallisepticum.gbff -o ./example_result02 --assign_cog_color \
           --query_files ./example02/*.gbff

Output Contents

  • circos[.png|.svg]
    Plot result figure file

  • reference_cds.faa
    Reference genome CDS fasta file (Extract from genbank file)

  • circos_config/
    Circos config files directory

  • circos_legend/
    Circos legend files directory

  • cogclassifier/
    COGclassifier result files directory

  • rbh_search/
    MMseqs RBH result files directory

Example Gallery

1. E.coli genome simple plot (No COG assignment)

Reference: ecoli.gbk (3.5 MB)

MGCplotter -r ./ecoli.gbk -o ./gallery_result01 --rrna_color blue --trna_color red \
           --gc_content_p_color orange --gc_content_n_color blue \
           --gc_skew_p_color pink --gc_skew_n_color green 

MGCplotter_gallery_fig

2. E.coli genome plot with 3 query conserved CDS

Reference: ecoli.gbk (3.5 MB), Query: gallery02 (10.7 MB)

MGCplotter -r ./ecoli.gbk -o ./gallery_result02 --assign_cog_color \
           --query_files ./gallery02/NC_011751.gbk ./gallery02/NC_017634.gbk ./gallery02/NC_018658.gbk \
           --ticks_labelsize 50

Conserved CDS tracks are lined up from outside to inside in --query_files argument order. In this case, NC_011751,NC_017634,NC_018658 are lined up from outside to inside.

MGCplotter_gallery_fig

3. M.Gallisepticum genome plot with 30 query conserved CDS

Reference: Mgallisepticum.gbff (0.63 MB), Query: gallery03 (19.6 MB)

MGCplotter -r ./Mgallisepticum.gbff -o ./gallery_result03 --assign_cog_color \
          --query_files ./gallery03/*.gbff --conserved_cds_color '#dc143c' \
          --rrna_r 0 --trna_r 0 --conserved_cds_r 0.01

MGCplotter_gallery_fig

4. M.Alvi genome contigs plot with 6 query conserved CDS

Reference: Malvi.gbk (0.57 MB), Query: gallery04 (1.0 MB)

MGCplotter -r ./Malvi.gbk -o ./gallery_result04 --assign_cog_color \
           --query_files ./gallery04/*.faa --conserved_cds_r 0.05 \
           --gc_content_r 0 --gc_skew_r 0

Malvi.gbk is multi record(contig) Genbank format genome file. In MGCplotter, multi contigs are simply concatenated and each contig boundary is shown in mostouter circle color (lightgrey/darkgrey).

MGCplotter_gallery_fig

5. M.Gallisepticum genome plot (User-defined COG classification color)

Reference: Mgallisepticum.gbk (0.63 MB), COG Color Json: cog_color.json (0.5 KB)

MGCplotter -r ./Mgallisepticum.gbff -o ./gallery_result05 --assign_cog_color \
          --cog_color_json ./cog_color.json

User can change COG functional classification color by user-defined color json file. Template json file can be obtained by generate_cog_color_template command.

COG functional classification color template json
{
  "J": "#f43cf3",
  "A": "#f04ff0",
  "K": "#f04fa0",
  "L": "#f04f4f",
  "B": "#f4793c",
  "D": "#f0f04f",
  "Y": "#f3f43c",
  "V": "#f5f52a",
  "T": "#f7f718",
  "M": "#caf718",
  "N": "#9ef718",
  "Z": "#71f718",
  "W": "#45f718",
  "U": "#18f718",
  "O": "#07f830",
  "X": "#07f807",
  "C": "#2af5f5",
  "G": "#3cf3f4",
  "E": "#4ff0f0",
  "F": "#4f9ff0",
  "H": "#4f4ff0",
  "I": "#793cf4",
  "P": "#3c3cf4",
  "Q": "#2a5df5",
  "R": "#939393",
  "S": "#808080",
  "-": "#6c6c6c"
}
COG color json used in this gallery (cog_color.json)
{
  "J": "red",
  "A": "red",
  "K": "red",
  "L": "red",
  "B": "red",
  "D": "limegreen",
  "Y": "limegreen",
  "V": "limegreen",
  "T": "limegreen",
  "M": "limegreen",
  "N": "limegreen",
  "Z": "limegreen",
  "W": "limegreen",
  "U": "limegreen",
  "O": "limegreen",
  "X": "limegreen",
  "C": "deepskyblue",
  "G": "deepskyblue",
  "E": "deepskyblue",
  "F": "deepskyblue",
  "H": "deepskyblue",
  "I": "deepskyblue",
  "P": "deepskyblue",
  "Q": "deepskyblue",
  "R": "lightgrey",
  "S": "lightgrey",
  "-": "darkgrey"
}

MGCplotter_gallery_fig

In this gallery, color classification is defined based on following five COG major categories.

  1. Information Storage and Processing (J,A,K,L,B) => red
  2. Cellular Processes and Signaling (D,Y,V,T,M,N,Z,W,U,O,X) => limegreen
  3. Metabolism (C,G,E,F,H,I,P,Q) => deepskyblue
  4. Poorly Characterized (R,S) => lightgrey
  5. No COG Classified (-) => darkgrey