-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with class imbalance in Deep Learning #193
Comments
I agree that class imbalance is a common issue in biology. How much of the discussion would be specific to deep learning as opposed to general ML? If the solutions are general, we may only mention it briefly instead of making a full tip. Do the solutions of rephrasing the problem and obtaining more data apply in biology? In settings like genome annotation or chemical bioactivity classification, the domain is inherently dominated by negatives regardless of how much data we acquire. This topic also fits with the brief sentence we have now about ROC having limited utility for class imbalanced problems. |
Good point. I am not sure how successful this is in general, but I stumbled upon a paper recently where the researchers used GANs to generate synthetic samples for addressing the imbalance issue. However, in general, I think DL is not more prone or immune to imbalancing then other ML approaches. One approach though that is more DL specific is the Focal Loss that was first proposed for the RetinaNet, for example.
|
I believe obtaining more data points can help for certain problems like problems in cancer genomics, where a lab could tap into the private data generated to help solve the problem. |
In line with @rasbt comment on GANs, I remember reading a paper which used RNNs to generate protein sequences having a certain type of activity. We could mention this as part of how to get more data samples. |
Have you checked the list of proposed tips to see if the tip has already been proposed?
Did you add yourself as a contributor by making a pull request if this is your first contribution?
Feel free to elaborate, rant, and/or ramble.
There might be a imbalance in the class distribution, which is quite common in Bioinformatics problems. I believe most of the points regarding dealing with imbalance in ML should work in Deep Learning as well-
Any citations for the rule? (peer-reviewed literature preferred but not required)
The text was updated successfully, but these errors were encountered: