Overview

GOPhage is a learning-based model, that can be used for annotation of phage proteins based on the Gene ontology terms. The major improvement in GOPhage can be attributed to utilizing the properties of phages and the foundation model. The Transformer model is used to learn the relationship of the genomic context proteins.

In addition, we integrate GOPhage with the DiamondBlastp to further improve the performance. You can choose to run GOPhage+ which has two versions based on the ESM2-12 and ESM2-33.

Quick install

Note: we suggest you install all the packages using Conda (both Miniconda and Anaconda are ok).

After cloning this repository, you can use Anaconda to install the ‘GOPhage.yaml’. This will install all packages you need with GPU mode (make sure you have installed Cuda on your system to use the GPU version).

Prepare the data and environment

Due to the limited size of the GitHub, we zip the data. You can download the database and model from Google Drive or Baidu Netdisk(百度网盘). You can follow the steps below to use GOPhage.

1. Download the code.

  git clone https://github.com/jiaojiaoguan/GOPhage.git

2. Install the conda environment.

  cd GOPhage/
  conda env create -f gophage.yaml -n gophage
  conda activate gophage

3. Download the database and model.

from the Google Drive:

https://drive.google.com/drive/folders/14IQ75pMW9FK0H4mwleGEAo6_M7vOJeG5?usp=sharing

from Baidu NetDisk(百度网盘):

链接：https://pan.baidu.com/s/1UafDBBdNyGE4oIf8ZF0Ulg 提取码：phag

Note: You need to put the "Database", "ESM_model", "GOPhage_model", "Protein_annotation" and "Term_label" folders in "GOPhage/".

4. Run GOPhage+ model.

  python GOPhage.py 
              --contigs [DNA FASTA file of contigs, you can only input contigs and ignore the --proteins and --sentences]
              --proteins [FASTA file of proteins, if you input the proteins, you also need to tell the proteins sentences]
              --sentences [The contigs sentence including the ordered proteins. Please separate each column with a comma.]
             --plm The name of PLM model (esm2-12 or esm2-33)
             --ont The ontology including BP, CC, and MF
             --batch_size The batch size for the input

Example.

  python GOPhage.py --proteins test_proteins.fasta --sentences contig_sentence.csv --ont CC --plm esm2-33

Output

If you use the esm2-12 model, the prediction will be written in BP_GOPhage_base_plus_prediction_labels.csv. If you use the esm2-33 model, the prediction will be written in BP_GOPhage_large_plus_prediction_labels.csv. The CSV file has three columns: Proteins, GO term, and score.

Contact

If you have any questions, please email us: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
GOPhage.py		GOPhage.py
README.md		README.md
dataloder.py		dataloder.py
get_esm_embedding.py		get_esm_embedding.py
gophage.png		gophage.png
gophage.yaml		gophage.yaml
model.py		model.py
phago.png		phago.png
prepare_gophage_input.py		prepare_gophage_input.py
test.fasta		test.fasta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Quick install

Prepare the data and environment

1. Download the code.

2. Install the conda environment.

3. Download the database and model.

from the Google Drive:

from Baidu NetDisk(百度网盘):

4. Run GOPhage+ model.

Example.

Output

Contact

About

Releases

Packages

Languages

jiaojiaoguan/GOPhage

Folders and files

Latest commit

History

Repository files navigation

Overview

Quick install

Prepare the data and environment

1. Download the code.

2. Install the conda environment.

3. Download the database and model.

from the Google Drive:

from Baidu NetDisk(百度网盘):

4. Run GOPhage+ model.

Example.

Output

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages