You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
skrub is a wonderful new project related to scikit-learn. You can see Gaël Varoquaux present it here. They have a transformer called GapEncoder: it's a way to embed fuzzy strings. This could be really powerful online, say for classifying Tweets or Twitch messages, where typos are aplenty.
We already have a way to do online TD-IDF/count vectorization. But we don't have Gamma-Poisson matrix factorization. It is doable online though. Once we have it, we could assemble the two into a nice GapEncoder class. See paper here.
This is related to #1412. Indeed, maybe this works well without Gamma-Poisson matrix factorization. For instance, we could use decomposition.LDA, which we already have.
The text was updated successfully, but these errors were encountered:
@MaxHalford I can take this up, need some getting started materials for doing this on streams. Will go through the paper and skrub. Open to discussions.
skrub is a wonderful new project related to scikit-learn. You can see Gaël Varoquaux present it here. They have a transformer called
GapEncoder
: it's a way to embed fuzzy strings. This could be really powerful online, say for classifying Tweets or Twitch messages, where typos are aplenty.We already have a way to do online TD-IDF/count vectorization. But we don't have Gamma-Poisson matrix factorization. It is doable online though. Once we have it, we could assemble the two into a nice GapEncoder class. See paper here.
This is related to #1412. Indeed, maybe this works well without Gamma-Poisson matrix factorization. For instance, we could use
decomposition.LDA
, which we already have.The text was updated successfully, but these errors were encountered: