TIPRE: Taxonomic Identity PREdiction

ML-taxonomic-identity-prediction

Homework for Machine Learning Course 2021 (MSc Bioinformatics for Computational Genomics) helded by Prof. Matteo Matteucci and Marco Cannici at Politecnico di Milano.

The notebook can be viewed here nbviewer

Aim

The aim of this project is to investigate the use of codon usage frequencies from different organisms to identify if they can be used to classify codon usage in terms of 11 Kingdoms: archea, bacteria, bacteriophage, plasmid, plant, invertebrate, vertebrate, mammal, rodent, primate and virus. The anaysis is carried out using techniques for clustering, classification and regression learned during the course.

Background

"The coding DNA of a genome describes the proteins of the organism in terms of 64 different codons that map to 21 different amino acids and a stop signal. Different organisms differ not only in the amino acid sequences of their proteins, but also in the extents in which they use the synonymous codons for different amino acids. The inherent redundancy of the genetic code allows the same amino acid to be specified by one to five different codons so that there are, in principle, many different nucleic acids to describe the primary structure of a given protein. Coding DNA sequences thus can carry information beyond that needed for encoding amino acid sequence. Thus, one may ask: is it possible to classify some properties of nucleic acids from the usages of different synonymous codons rather than, with much greater computational effort, from individual nucleotide sequences themselves?" — Khomtchouk, Bohdan B. "Codon usage bias levels predict taxonomic identity and genetic composition." bioRxiv (2020).

This data set enables a preliminary analysis on this topic.

Reference

Khomtchouk, Bohdan B. "Codon usage bias levels predict taxonomic identity and genetic composition." bioRxiv (2020)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
ML2021_Homework_Description.pdf		ML2021_Homework_Description.pdf
ML_Grieco_notebook.ipynb		ML_Grieco_notebook.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TIPRE: Taxonomic Identity PREdiction

Aim

Background

Reference

About

Releases

Packages

Languages

mariachiaragrieco/TIPre

Folders and files

Latest commit

History

Repository files navigation

TIPRE: Taxonomic Identity PREdiction

Aim

Background

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages