Bumblebee (platform for the representation and classification of peptides/proteins using word embedding models) This framework aims to ease
the implementation of word embedding models toward protein representation
and classification. The module developed facilitates the tokenization of sequences,
training of WE models using different algorithms, downloading pre-trained state-of-art WE models,
vectorization of protein sequences and visualization and interpretability of WE. Accordingly,
the module is able to process biological sequences aiming to search for semantic meaning
in sequence "words". Although it can be used for both protein and DNA sequences, it was only tested in proteins.
The module made it easy, quick and in an intuitive way, to test several conformations of WE.
It was tested on the classification of plant ubiquitylation sites, lysine crotonylation sites and enzimes available in folder case studies.