Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why useing the cnOCR it's not accurate as you are using easyocr then it is fine #130

Open
KTBsomen opened this issue Jul 12, 2024 · 3 comments

Comments

@KTBsomen
Copy link

No description provided.

@breezedeus
Copy link
Owner

Which kinds of images. You can change the code to use other ocr engines.

@KTBsomen
Copy link
Author

one more problem i saw is that the text output is different if I use raw easyocr and your pix2tex.recognize_text_formula

here is pix2tex output :

73。 What is the compound interest (i Rs) on a sum Of Rs. 8192 for
$$
1 {\frac{1} {4}} \ \mathrm{y e a r s \; a t \; 1 5 \% \; p e r \; a n n u m, \; i f}
$$
"" coooounoed #~ nonths4p (2)  1740 1640
1634 (4)1735
Exassccotzoeolshier]
13.08.2020 (Shift-J)

here is easyocr output:

[([[107, 71], [173, 71], [173, 111], [107, 111]], '73.', 0.9917022049101489), ([[193, 69], [431, 69], [431, 130], [193, 130]], 'What is the', 0.9887154965614935), ([[427, 93], [797, 93], [797, 149], [427, 149]], 'compound interest', 0.9978926405302552), ([[192, 120], [799, 120], [799, 198], [192, 198]], '(in Rs) on a sum of Rs, 8192 for', 0.5171315037820216), ([[237, 203], [257, 203], [257, 231], [237, 231]], '1', 0.09841237729462193), ([[202, 226], [226, 226], [226, 258], [202, 258]], '1', 0.9996012846778548), ([[234, 260], [264, 260], [264, 294], [234, 294]], '4', 1.0), ([[277, 238], [795, 238], [795, 303], [277, 303]], 'years at 15% per annum; if', 0.7160528684004921), ([[184, 312], [369, 312], [369, 365], [184, 365]], 'interest', 0.9999773248890145), ([[380, 326], [722, 326], [722, 383], [380, 383]], 'is compounded', 0.9875069147908678), ([[737, 341], [775, 341], [775, 381], [737, 381]], '5-', 0.8094714373744847), ([[185, 357], [386, 357], [386, 423], [185, 423]], 'monthly ?', 0.8475277887695424), ([[205, 415], [259, 415], [259, 459], [205, 459]], '(1)', 0.9965351652399872), ([[282, 415], [391, 415], [391, 465], [282, 465]], '1640', 0.9999983310699463), ([[506, 428], [679, 428], [679, 477], [506, 477]], '(2) 1740', 0.7805645267010147), ([[203, 467], [257, 467], [257, 511], [203, 511]], '(3)', 0.9998854752209928), ([[283, 471], [385, 471], [385, 515], [283, 515]], '1634', 0.6752449204900562), ([[502, 480], [677, 480], [677, 530], [502, 530]], '(4) 1735', 0.8574826030860164), ([[401, 523], [786, 523], [786, 584], [401, 584]], 'SSC CGL (CBE) Tier-I', 0.8960124626829515), ([[448, 574], [782, 574], [782, 628], [448, 628]], '13.08.2020 (Shift-I)', 0.7733794959921725), ([[335.0939460213589, 569.1039678918432], [449.9501978735121, 585.3707649405025], [443.9060539786411, 619.8960321081568], [329.0498021264879, 604.6292350594975]], 'Exam,', 0.9631557823077296)]

extracted texts are perfect in this raw easyocr

and I am using easyocr I know as I have changed the prepare_ocr_engine function

def prepare_ocr_engine(languages: Sequence[str], ocr_engine_config):
    ocr_engine_config = deepcopy(ocr_engine_config) if ocr_engine_config else {}
    if 1==2:
        from cnocr import CnOcr

        if 'ch_sim' not in languages and 'cand_alphabet' not in ocr_engine_config:  # only recognize english characters
            ocr_engine_config['cand_alphabet'] = string.printable
        ocr_engine = CnOcr(**ocr_engine_config)
        engine_wrapper = CnOCREngine(languages, ocr_engine)
    else:
        print("using easyocr")
        try:
            from easyocr import Reader
        except:
            raise ImportError('Please install easyocr first: pip install easyocr')
        gpu = False
        if 'context' in ocr_engine_config:
            context = ocr_engine_config.pop('context').lower()
            gpu = 'gpu' in context or 'cuda' in context
        ocr_engine = Reader(lang_list=['en'], gpu=gpu, **ocr_engine_config)
        engine_wrapper = EasyOCREngine(languages, ocr_engine)
    return engine_wrapper

@KTBsomen
Copy link
Author

sd
this is my image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants