Skip to content

[WIP] python package to embed protein sequences using different models (contextualized and not)

License

Notifications You must be signed in to change notification settings

hmms117/bio_embeddings

 
 

Repository files navigation

This repository is a WIP

  • Goal: a python package where sequence goes in --> embeddings come out
  • Goal #2: provide as many models as possible
  • Goal #3 include feature predictors

Progress

Embedders:

  • SeqVec v1
  • SeqVec v2
  • TransformerXL
  • Fastext
  • Glove
  • Word2Vec

Feature extractors

  • SeqVec v1
    • DSSP8
    • DSSP3
    • Disorder
    • Subcell loc
    • Membrane boundness
  • SeqVec v2
  • TransformerXL
  • Fastext
  • Glove
  • Word2Vec

Todo

  • Decouple embedders from feature extractors
  • Add more embedders

Wanna use it now?

Use the notebooks folder, that will always include the latest version of the src. Note: although this is in alpha, we will try to keep the API consistent.

About

[WIP] python package to embed protein sequences using different models (contextualized and not)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 94.9%
  • Python 5.1%