Skip to content

Latest commit

 

History

History
111 lines (86 loc) · 3.78 KB

README.md

File metadata and controls

111 lines (86 loc) · 3.78 KB

📙 HTR-VT (Pattern Recognition)

PWC

PWC

PWC

Introduction

This is the official implementation of our Pattern Recognition(PR) 2024 paper "HTR-VT: Handwritten Text Recognition with Vision Transformer". It's a new and effective baseline for handwritten text recognition solely using Vision Transformer and CTC Loss.

[Project Page] [Paper] [arXiv] [Google Drive]

Table of Content

1. Overview

teaser

2. Visual Results

method

3. Installation

3.1. Environment

Our model can be learnt in a single GPU RTX-4090 24G

conda env create -f environment.yml
conda activate htr

The code was tested on Python 3.9 and PyTorch 1.13.0.

3.2. Datasets

  • Using IAM, READ2016 and LAM for handwritten text recognition.
IAM
Register at the FKI's webpage :https://fki.tic.heia-fr.ch/databases/iam-handwriting-database)
Download the dataset from here :https://fki.tic.heia-fr.ch/databases/download-the-iam-handwriting-database
READ2016
wget https://zenodo.org/record/1164045/files/{Test-ICFHR-2016.tgz,Train-And-Val-ICFHR-2016.tgz}
LAM
Download the dataset from here: https://aimagelab.ing.unimore.it/imagelab/page.asp?IdPage=46
  • Download datasets to ./data/. Take IAM for an example: The structure of the file should be:
./data/iam/
├── train.ln
├── val.ln
├── test.ln
└── lines
      ├──a01-000u-00.png
      ├──a01-000u-00.txt
      ├──a01-000u-01.png
      ├──a01-000u-01.txt
      ...

4. Quick Start

  • We provide convenient and comprehensive commands in ./run/ to train and test on different datasets to help researchers reproducing the results of the paper.

5. Citation

If our project is helpful for your research, please consider citing :

@article{li2024htr,
  title={HTR-VT: Handwritten text recognition with vision transformer},
  author={Li, Yuting and Chen, Dexiong and Tang, Tinglong and Shen, Xi},
  journal={Pattern Recognition},
  pages={110967},
  year={2024},
  publisher={Elsevier}
}

6. Acknowledgement

We appreciate helps from public code: VAN and OrigamiNet.