Adding Differential Binarization model from PaddleOCR to Keras3 #1739

gowthamkpr · 2024-08-06T08:24:44Z

This adds the Differential Binarization model for scene text detection.

I implemented the architecture based on ResNet50_vd from PaddleOCR and ported the weights.

Demo colab: https://colab.research.google.com/gist/gowthamkpr/bd4a7f7742e92e66cfc57827052b8619/keras_paddleocr_v3.ipynb

mattdangerw · 2024-08-06T17:44:02Z

Let's split this up. Start with ResNetVD backbone?

Some notes...

Remove the aliases. One ResNetVDBackbone can handle all of these with different presets.
Conversion scripts as scripts not colabs.
Follow the local style for backbones as closely as possible. See some comments here Add VGG16 and VGG19 backbone #1737
Keep models a flat directory. No backbones/xx etc.
Add some tests.

divyashreepathihalli · 2024-09-25T19:42:33Z

@gowthamkpr is the PR ready for review?

divyashreepathihalli

Thanks for the PR! I have left a reorganization comment.

example for structuring the code - https://github.com/keras-team/keras-hub/tree/master/keras_hub/src/models/sam

keras_nlp/src/models/diffbin/diffbin.py

keras_nlp/src/models/diffbin/losses.py

divyashreepathihalli · 2024-10-24T19:47:39Z

Hi @gowthamkpr! can you please refactor the code to KerasHub style?

Add a preprocessor flow
subclass image segementer model for the task class
add preset class
add standard test routines

gowthamkpr · 2024-10-29T21:35:06Z

Hi @gowthamkpr! can you please refactor the code to KerasHub style?

I've refactored using SAM as example.

* [ ]  Add a preprocessor flow

I've added DifferentialBinarizationPreprocessor and DifferentialBinarizationImageConverter.

* [ ]  subclass image segementer model for the task class

I've subclassed ImageSegmenter, but I left the custom compile() method, since we need a different loss than the one used in ImageSegmenter's compile().

* [ ]  add preset class

Done. The model is not yet in Kaggle, so I've disabled the presets test for now.

* [ ]  add standard test routines

Done. Not sure if there are additional standard test routines other than the ones used in SAM that should be run.

divyashreepathihalli

Thanks Gowtham! left a few comments!

keras_hub/src/models/differential_binarization/differential_binarization_backbone.py

divyashreepathihalli · 2024-11-06T18:05:54Z

keras_hub/src/models/differential_binarization/differential_binarization_backbone_test.py

+                56,
+                256,
+            ),
+            run_mixed_precision_check=False,


does the mixed precision check pass?

No. I tried adding an explicit dtype argument, but the problem remains that the mixed precision check checks against each sublayer of the model. The ResNet backbone, which is instantiated separately, therefore has the wrong dtype.

keras_hub/src/models/differential_binarization/differential_binarization_test.py

keras_hub/src/models/differential_binarization/differential_binarization.py

divyashreepathihalli

Thanks for the PR Gowtham! Left a few comments. Can you please also add a demo colab in the PR description to verify the model is working before merging?

keras_hub/src/models/differential_binarization/differential_binarization_backbone.py

divyashreepathihalli · 2024-11-13T22:34:33Z

keras_hub/src/models/differential_binarization/differential_binarization_image_converter.py

+
+@keras_hub_export("keras_hub.layers.DifferentialBinarizationImageConverter")
+class DifferentialBinarizationImageConverter(ImageConverter):
+    backbone_cls = DifferentialBinarizationBackbone


there should be some resizing/rescaling ops here right?

Depends. Basically these image operations are implemented in the super class, ImageConverter, and can be used as depicted in the demo colab I've added in the PR description. Dedicated code in this class might make sense to resize to resolutions of multiples of 32, which the model requires. On the other hand, it might be confusing for the user if the masks that are predicted have different resolutions than the input.

divyashreepathihalli · 2024-11-13T22:35:38Z

keras_hub/src/models/differential_binarization/differential_binarization_ocr.py

+
+
+@keras_hub_export("keras_hub.models.DifferentialBinarizationOCR")
+class DifferentialBinarizationOCR(ImageSegmenter):


we need to add a new base class for ocr, I don't think ImageSegmenter is a good. one. Do you have a specific reason you chose to subclass ImageSegmenter?

Actually you suggested to subclass ImageSegmenter (here) if I understood correctly. Technically, the task is somewhat similar to segmentation tasks. We can of course add a separate base class for it to catch the semantic differences, but I would rather name it "scene text detection".

Yeah I think it is better to add a new base class for OCR.

Sure. I suggest to create an ImageTextDetector base class and include in this class the code (from the notebook) for translating the segmentation mask output into polygons, which are often needed in such applications. I'll try to get rid of the OpenCV and shapely dependencies in this code, I believe we don't have them in our requirements.
Does this work for you?

sounds good!

What's the output of the task? Seems like this is not quite OCR right? Not text output?

Is this just a piece of what we would need for a full OCR setup?

What's the output of the task? Seems like this is not quite OCR right? Not text output?

Is this just a piece of what we would need for a full OCR setup?

It performs scene text detection, not OCR in the narrow sense. It belongs to OCR in a wider sense, though.

With text detection, we find where there is text in the image. The model typically outputs polygons or bounding boxes (after postproc), indicating the positions of text fragments. These portions of the image are then fed into an OCR model to get the text in ASCII.

Sure. I suggest to create an ImageTextDetector base class and include in this class the code (from the notebook) for translating the segmentation mask output into polygons, which are often needed in such applications. I'll try to get rid of the OpenCV and shapely dependencies in this code, I believe we don't have them in our requirements. Does this work for you?

I've added the ImageTextDetector task with the logic for transformation to polygons. With our dependencies, this requires quite a bit of code, but it works well.

mattdangerw

Mostly questions and some style stuff.

I am curious what to do with the task here. Does this output a segmentation mask? Maybe most importantly, does this fit into a bigger picture of an OCR system? If so, how do we expect the whole thing to work?

mattdangerw · 2024-11-21T02:53:26Z

keras_hub/src/models/differential_binarization/differential_binarization_backbone.py

+
+@keras_hub_export("keras_hub.models.DifferentialBinarizationBackbone")
+class DifferentialBinarizationBackbone(Backbone):
+    """


always start docstring with a one liner

I've improved/added the docstrings here and in losses.py. ptal

keras_hub/src/models/differential_binarization/differential_binarization_ocr.py

mattdangerw · 2024-11-21T05:03:58Z

keras_hub/src/models/differential_binarization/differential_binarization_ocr.py

+
+
+@keras_hub_export("keras_hub.models.DifferentialBinarizationOCR")
+class DifferentialBinarizationOCR(ImageSegmenter):


What's the output of the task? Seems like this is not quite OCR right? Not text output?

Is this just a piece of what we would need for a full OCR setup?

mattdangerw · 2024-11-21T05:05:14Z

keras_hub/src/models/differential_binarization/differential_binarization_ocr.py

+        backbone=backbone
+    )
+
+    detector(input_data)


What does the output of the task look like?

We get the probability map, threshold map and binary map output in the last dimension from the model. I've added some documentation here.

mattdangerw · 2024-11-21T05:08:42Z

keras_hub/src/models/differential_binarization/differential_binarization_presets.py

+"""Differential Binarization preset configurations."""
+
+backbone_presets = {
+    "diffbin_r50vd_icdar2015": {


I think we should use consistent abbreviations. If the class is DifferentialBinarizationXX, then the preset name would be "differential_binarization". Or we could do DiffBinBackbone and diff_bin_.... for the preset name.

Part of me is actually tempted by the latter, since DifferentialBinarization is such a mouthful. But I am not sure what is common in other libraries? Is there a common pattern in other projects?

PaddleOCR mostly uses "DB", but I find this abbreviation somewhat indistinctive (in the sense that you have a hard time googling it). I've also seen "DBNet", but I don't know where this name comes from, as the paper never names it "DBNet", but just "Differential Binarization".

Agreed that DifferentialBinarization is quite long, though, and consistency is always good.

mattdangerw · 2024-11-21T05:09:11Z

keras_hub/src/models/differential_binarization/losses.py

+
+
+class DiceLoss:
+    def __init__(self, eps=1e-6, **kwargs):


A brief comment explaining these would be helpful.

keras_hub/src/models/differential_binarization/losses.py

mattdangerw · 2024-11-21T05:14:50Z

keras_hub/src/models/differential_binarization/losses.py

+        return balance_loss
+
+
+class DBLoss(keras.losses.Loss):


Again lets be consistent in our abbreviations. We have DiffBin, DB and DifferentialBinarization. Maybe DiffBin is the right middle ground everywhere? But as I mentioned above, not sure what is common in other projects.

(Can you please see the above comment) I'll rename as soon as we have agreed on a good name.

mattdangerw changed the base branch from master to keras-hub August 6, 2024 17:36

mattdangerw requested a review from divyashreepathihalli August 6, 2024 20:48

divyashreepathihalli mentioned this pull request Aug 8, 2024

Add OCR model to Keras-nlp/keras hub branch #1727

Open

gowthamkpr mentioned this pull request Aug 9, 2024

Add the ResNet_vd backbone #1766

Merged

mattdangerw force-pushed the keras-hub branch 2 times, most recently from 1826dce to 753047d Compare September 11, 2024 00:01

gowthamkpr force-pushed the diffbin branch from 4dc7f78 to 3d06308 Compare September 13, 2024 13:44

mattdangerw force-pushed the keras-hub branch from 753047d to a5e5d8f Compare September 13, 2024 20:00

gowthamkpr force-pushed the diffbin branch from b9e7a3c to beaf088 Compare September 17, 2024 16:12

divyashreepathihalli requested a review from fchollet September 25, 2024 19:42

divyashreepathihalli reviewed Sep 26, 2024

View reviewed changes

keras_nlp/src/models/diffbin/diffbin.py Outdated Show resolved Hide resolved

keras_nlp/src/models/diffbin/diffbin.py Outdated Show resolved Hide resolved

keras_nlp/src/models/diffbin/losses.py Outdated Show resolved Hide resolved

gowthamkpr added 7 commits October 22, 2024 21:28

Add DifferentialBinarization model

49f6bb1

Added tests for DifferentialBinarization losses

5b4e011

Moved DifferentialBinarization to keras_hub

12ab81c

Renamed to differential_binarization.py

e68512c

Refactorings for DifferentialBinarization

0c3235c

More refactorings

6797231

Fix tests

4845b6a

gowthamkpr force-pushed the diffbin branch from beaf088 to 4845b6a Compare October 22, 2024 20:15

gowthamkpr changed the base branch from keras-hub to master October 22, 2024 20:24

gowthamkpr added 7 commits October 29, 2024 20:02

Add preprocessor and image converter

83edf9a

Add presets

f15b7b9

Run formatting script

392dbff

Impl additional tests

db70eb5

Fixed formatting

18fcbfb

Removed copyright statements

898235d

Fix tests, run api_gen.sh

eaec868

Merge branch 'master' into diffbin

21b6312

divyashreepathihalli reviewed Nov 6, 2024

View reviewed changes

gowthamkpr added 3 commits November 11, 2024 20:38

Addressed comments

9fb6e65

Merge with local branch

83b66ed

Fixed torch and jax tests

e4a334d

divyashreepathihalli requested changes Nov 13, 2024

View reviewed changes

Improved code readability

49d6f6d

mattdangerw reviewed Nov 21, 2024

View reviewed changes

gowthamkpr added 3 commits November 22, 2024 20:58

Improved/added docstrings

d96b899

Added ImageTextDetector task

2f27981

Run api_gen.sh

66afeb9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Differential Binarization model from PaddleOCR to Keras3 #1739

Adding Differential Binarization model from PaddleOCR to Keras3 #1739

gowthamkpr commented Aug 6, 2024 •

edited

Loading

mattdangerw commented Aug 6, 2024

divyashreepathihalli commented Sep 25, 2024

divyashreepathihalli left a comment

divyashreepathihalli commented Oct 24, 2024

gowthamkpr commented Oct 29, 2024

divyashreepathihalli left a comment

divyashreepathihalli Nov 6, 2024

gowthamkpr Nov 11, 2024

divyashreepathihalli left a comment

divyashreepathihalli Nov 13, 2024

gowthamkpr Nov 18, 2024

divyashreepathihalli Nov 13, 2024

gowthamkpr Nov 18, 2024

divyashreepathihalli Nov 19, 2024

gowthamkpr Nov 20, 2024

divyashreepathihalli Nov 20, 2024

mattdangerw Nov 21, 2024

gowthamkpr Nov 21, 2024

gowthamkpr Nov 27, 2024

mattdangerw left a comment

mattdangerw Nov 21, 2024

gowthamkpr Nov 27, 2024

mattdangerw Nov 21, 2024

mattdangerw Nov 21, 2024

gowthamkpr Nov 27, 2024

mattdangerw Nov 21, 2024

gowthamkpr Nov 21, 2024 •

edited

Loading

mattdangerw Nov 21, 2024

gowthamkpr Nov 27, 2024

mattdangerw Nov 21, 2024

gowthamkpr Nov 27, 2024 •

edited

Loading



		@keras_hub_export("keras_hub.models.DifferentialBinarizationOCR")
		class DifferentialBinarizationOCR(ImageSegmenter):

Adding Differential Binarization model from PaddleOCR to Keras3 #1739

Are you sure you want to change the base?

Adding Differential Binarization model from PaddleOCR to Keras3 #1739

Conversation

gowthamkpr commented Aug 6, 2024 • edited Loading

mattdangerw commented Aug 6, 2024

divyashreepathihalli commented Sep 25, 2024

divyashreepathihalli left a comment

Choose a reason for hiding this comment

divyashreepathihalli commented Oct 24, 2024

gowthamkpr commented Oct 29, 2024

divyashreepathihalli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

divyashreepathihalli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gowthamkpr Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gowthamkpr Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

gowthamkpr commented Aug 6, 2024 •

edited

Loading

gowthamkpr Nov 21, 2024 •

edited

Loading

gowthamkpr Nov 27, 2024 •

edited

Loading