A Java GUI frontend for Tesseract OCR engine. Supports optical character recognition for Vietnamese and other languages supported by Tesseract.
VietOCR is released and distributed under the Apache License, v2.0.
- Multi-platform
- PDF, TIFF, JPEG, GIF, PNG, BMP image formats
- Multi-page TIFF images
- Screenshots
- Selection box
- File drag-and-drop
- Paste image from clipboard
- Text search and replace
- Postprocessing for Vietnamese to boost accuracy rate
- Vietnamese input methods
- Localized user interface for many languages (Localization project)
- Integrated scanning support
- Watch folder monitor for support of batch processing
- Custom text replacement in postprocessing
- Spellcheck with Hunspell
- Support for downloading and installing language data packs and appropriate spell dictionaries
To launch the program from the command line:
java -jar VietOCR.jar
or for CLI option:
java -jar VietOCR.jar imagefile outputfile [-l lang] [--psm pagesegmode] [text|hocr|pdf|pdf_textonly|unlv|box|alto|page|tsv|lstmbox|wordstrbox] [postprocessing] [correctlettercases] [deskew] [removelines] [removelinebreaks]