Dealing with class imbalance in Deep Learning #193

souravsingh · 2019-07-21T22:48:46Z

Have you checked the list of proposed tips to see if the tip has already been proposed?

Yes

Did you add yourself as a contributor by making a pull request if this is your first contribution?

Yes, I added myself or am already a contributor

Feel free to elaborate, rant, and/or ramble.
There might be a imbalance in the class distribution, which is quite common in Bioinformatics problems. I believe most of the points regarding dealing with imbalance in ML should work in Deep Learning as well-

Try rephrasing the problem
Obtain more data
Tweak weights appropriately for class imbalance
Applying Regularization techniques
Use Oversampling or Undersampling techniques(?)
Using K-fold CV in the correct way

Any citations for the rule? (peer-reviewed literature preferred but not required)

agitter · 2019-07-22T12:25:08Z

I agree that class imbalance is a common issue in biology. How much of the discussion would be specific to deep learning as opposed to general ML? If the solutions are general, we may only mention it briefly instead of making a full tip.

Do the solutions of rephrasing the problem and obtaining more data apply in biology? In settings like genome annotation or chemical bioactivity classification, the domain is inherently dominated by negatives regardless of how much data we acquire.

This topic also fits with the brief sentence we have now about ROC having limited utility for class imbalanced problems.

rasbt · 2019-07-22T14:27:26Z

I agree that class imbalance is a common issue in biology. How much of the discussion would be specific to deep learning as opposed to general ML? If the solutions are general, we may only mention it briefly instead of making a full tip.

Good point. I am not sure how successful this is in general, but I stumbled upon a paper recently where the researchers used GANs to generate synthetic samples for addressing the imbalance issue. However, in general, I think DL is not more prone or immune to imbalancing then other ML approaches.

One approach though that is more DL specific is the Focal Loss that was first proposed for the RetinaNet, for example.

Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988). (https://arxiv.org/abs/1708.02002)

souravsingh · 2019-07-22T16:36:50Z

I believe obtaining more data points can help for certain problems like problems in cancer genomics, where a lab could tap into the private data generated to help solve the problem.

souravsingh · 2019-07-22T16:42:48Z

In line with @rasbt comment on GANs, I remember reading a paper which used RNNs to generate protein sequences having a certain type of activity. We could mention this as part of how to get more data samples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with class imbalance in Deep Learning #193

Dealing with class imbalance in Deep Learning #193

souravsingh commented Jul 21, 2019 •

edited

Loading

agitter commented Jul 22, 2019

rasbt commented Jul 22, 2019

souravsingh commented Jul 22, 2019

souravsingh commented Jul 22, 2019

Dealing with class imbalance in Deep Learning #193

Dealing with class imbalance in Deep Learning #193

Comments

souravsingh commented Jul 21, 2019 • edited Loading

agitter commented Jul 22, 2019

rasbt commented Jul 22, 2019

souravsingh commented Jul 22, 2019

souravsingh commented Jul 22, 2019

souravsingh commented Jul 21, 2019 •

edited

Loading