Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to increase pre-trained model accuracy for custom data #9

Open
ankit-crossml opened this issue May 21, 2020 · 1 comment
Open

Comments

@ankit-crossml
Copy link

I used your demo and pre-trained model for one of my attached sample table but the result are not much promising. It detected all columns as "address".

I face the same problem with the sherlock model (which you are referring too).

Sato_Demo

So what is the best way to do transfer learning and train a more promising model on custom data?

Being a Deep Learning engineer we can work collaboratively to improve this model and repository.

A quick reply will be really appreciated.

@horseno
Copy link
Collaborator

horseno commented Mar 24, 2021

Sorry about the late reply. This is a bug we inherited from Sherlock due to the use of dictionary-based word-embeddings. Columns like "id" have values that do not exist in the Glove dictionary we used to extract the feature. This leads to undefined values "NaN" in word embedding features, and they are later propagated into the network. Then the model falls back to predicting type 0 which is "address". We've pushed a simple fix to convert undefined values and the demo has been updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants