Anu

This project is not completed.

Anu, a machine learning (ML) model to predict protein-protien interaction. Anu is a framework to test and benchmark ML models for prediction protein-protein interactions. It automates data retrieval, feature engineering and model evaluation.

Getting Started

Requirement

git
python 3.7 or above
python virtual environment

Developing

Clone this repository

git clone https://github.com/ankitskvmdam/anu.git

Create a python virtual env

python -m venv venv     # create python environment
. ./venv/bin/activate    # activate python enviroment

Install poetry

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

or

pipx install poetry

For more information about poetry visit poetry docs

Install nox

pip install nox

Nox

Run tests, lint check, type check, doc tests, coverage

nox

For more information visit nox tutorial

Using

In order to use this tool. First few steps are similar to developing step.

Clone this repository

git clone https://github.com/ankitskvmdam/anu.git

Install poetry

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

or

pipx install poetry

Install anu

Now you have to run the following command

# First move to the directory
cd anu

# Installing anu
poetry install

Initial steps

Step 1: Download the databases.

Pickle - Interacting protein database
Negatome - Non-interacting protein database

Currently there is no way to specify anu to download only one databases. This feature will be implemented in future release.

# Download both databases
anu data fetch databases

# For help/more information
anu data fetch databases --help

Step 2: Prepare dataframe

Pickle dataset dataframe (vaex dataframes)
Negatome dataset dataframe (vaex dataframes)

Currently there is no way to specify anu to make individual dataframes. This feature will be implemented in future release.

# Prepare pickle and negatome dataframe
anu data prepare dataframes

# For help/more information
anu data prepare dataframes -- help

Step 3: Fetch PDB files

Now we have to fetch the PDB file.

Since there are almost 30,000 proteins in pickle database and around 10,000 in negative database. It is hard to fetch them all at once. The fetching process is resumable. And for testing only 300 to 400 files for each dataset is enough. So once you have downloaded enough file you can press ctrl+c to exit.

# For help/more information
anu data fetch pdb --help

# Fetch pdb files for protein present in pickle dataset
anu data fetch pdb -p
# or
anu data fetch pdb --pickle

# Fetch pdb file for protein present in negatome dataset
anu data fetch pdb -n
# or
anu data fetch pdb --negatome

# Fetch pdb file from both data set
anu data fetch pdb

If the pdb file is already downloaded it will not be downloaded again. Downloading of pdb files is sync between both datasets.

Step 4: Prepare input for train

This is also a time taking process.

# For help/more information
anu data prepare inputs --help

# Prepare interacting protein dataframe
anu data prepare inputs -i
# or
anu data prepare inputs --interacting

# Prepare non interacting protein dataframe
anu data prepare inputs -n
# or
anu data prepare inputs --non-interacting

# Prepare both input dataframes
anu data prepare inputs

Step 5: Train model

Currently cnn model is only available.

anu train cnn

Step 6: Predict

Before prediction you have to train the model.

# For help/more information
anu predict protein --help

# given pdb id as input
anu predict protein -p "1gzx" "4hh3"

# give uniprot id as input
anu predict protein -u "F4JRB0" "Q8RX29"

# give path as input
anu predict protein "path/to/protein/a.pdb" "path/to/protein/b.pdb"

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.dvc		.dvc
.github		.github
docs		docs
notebooks		notebooks
report		report
src/anu		src/anu
tests		tests
.darglint		.darglint
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CNAME		CNAME
LICENSE		LICENSE
README.md		README.md
data.dvc		data.dvc
mypy.ini		mypy.ini
noxfile.py		noxfile.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anu

Getting Started

Requirement

Developing

Clone this repository

Create a python virtual env

Install poetry

Install nox

Nox

Using

Clone this repository

Install poetry

Install anu

Initial steps

Step 1: Download the databases.

Step 2: Prepare dataframe

Step 3: Fetch PDB files

Step 4: Prepare input for train

Step 5: Train model

Step 6: Predict

About

Releases

Packages

Contributors 2

Languages

License

ankitskvmdam/anu

Folders and files

Latest commit

History

Repository files navigation

Anu

Getting Started

Requirement

Developing

Clone this repository

Create a python virtual env

Install poetry

Install nox

Nox

Using

Clone this repository

Install poetry

Install anu

Initial steps

Step 1: Download the databases.

Step 2: Prepare dataframe

Step 3: Fetch PDB files

Step 4: Prepare input for train

Step 5: Train model

Step 6: Predict

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages