Skip to content

nguyenq/VietOCR3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VietOCR

A Java GUI frontend for Tesseract OCR engine. Supports optical character recognition for Vietnamese and other languages supported by Tesseract.

VietOCR is released and distributed under the Apache License, v2.0.

Features

  • Multi-platform
  • PDF, TIFF, JPEG, GIF, PNG, BMP image formats
  • Multi-page TIFF images
  • Screenshots
  • Selection box
  • File drag-and-drop
  • Paste image from clipboard
  • Text search and replace
  • Postprocessing for Vietnamese to boost accuracy rate
  • Vietnamese input methods
  • Localized user interface for many languages (Localization project)
  • Integrated scanning support
  • Watch folder monitor for support of batch processing
  • Custom text replacement in postprocessing
  • Spellcheck with Hunspell
  • Support for downloading and installing language data packs and appropriate spell dictionaries

Instructions

To launch the program from the command line:

java -jar VietOCR.jar

or for CLI option:

java -jar VietOCR.jar imagefile outputfile [-l lang] [--psm pagesegmode] [text|hocr|pdf|pdf_textonly|unlv|box|alto|page|tsv|lstmbox|wordstrbox] [postprocessing] [correctlettercases] [deskew] [removelines] [removelinebreaks]

Dependencies