This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN, RNN and CTC loss for image-based sequence recognition tasks, such as scene text recognition and OCR. For details, please refer to our paper http://arxiv.org/abs/1507.05717.
The software has only been tested on Ubuntu 14.04 (x64). CUDA-enabled GPUs are required. To build the project, first install the latest versions of Torch7, fblualib and LMDB. Please follow their installation instructions respectively. On Ubuntu, lmdb can be installed by apt-get install liblmdb-dev
.
To build the project, go to src/
and execute sh build_cpp.sh
to build the C++ code. If successful, a file named libcrnn.so
should be produced in the src/
directory.
A demo program can be found in src/demo.lua
. Before running the demo, download a pretrained model from here. Put the downloaded model file crnn_demo_model.t7
into directory model/crnn_demo/
. Then launch the demo by:
th demo.lua
The demo reads an example image and recognizes its text content.
Expected output:
Loading model...
Model loaded from ../model/crnn_demo/model.t7
Recognized text: available (raw: a-----v--a-i-l-a-bb-l-e---)
The pretrained model can be used for lexicon-free and lexicon-based recognition tasks. Refer to the functions recognizeImageLexiconFree
and recognizeImageWithLexicion
in file utilities.lua
for details.
Follow the following steps to train a new model on your own dataset.
- Create a new LMDB dataset. A python program is provided in
tool/create_dataset.py
. Refer to the functioncreateDataset
for details (need topip install lmdb
first). - Create model directory under
model/
. For example,model/foo_model
. Then create configuraton fileconfig.lua
under the model directory. You can copymodel/crnn_demo/config.lua
and do modifications. - Go to
src/
and executeth main_train.lua ../models/foo_model/
. Model snapshots and logging file will be saved into the model directory.
Please cite the following paper if you are using the code/model in your research paper.
@article{ShiBY15,
author = {Baoguang Shi and
Xiang Bai and
Cong Yao},
title = {An End-to-End Trainable Neural Network for Image-based Sequence Recognition
and Its Application to Scene Text Recognition},
journal = {CoRR},
volume = {abs/1507.05717},
year = {2015}
}
The authors would like to thank the developers of Torch7, TH++, lmdb-lua-ffi and char-rnn.
Please let me know if you encounter any issues.