Geometric Augmentation for Text Image

Last update: Jan 05, 2023

Overview

Text Image Augmentation

A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition". We provide the tool to avoid overfitting and gain robustness of text recognizers.

Note that this is a general toolkit. Please customize for your specific task. If the repo benefits your work, please cite the papers.

News

2020-02 The paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition" was accepted to CVPR 2020. It is a preliminary attempt for smart augmentation.
2019-11 The paper "Decoupled Attention Network for Text Recognition" (Paper Code) was accepted to AAAI 2020. This augmentation tool was used in the experiments of handwritten text recognition.
2019-04 We applied this tool in the ReCTS competition of ICDAR 2019. Our ensemble model won the championship.
2019-01 The similarity transformation was specifically customized for geomeric augmentation of text images.

Requirements

GCC 4.8.*
Python 2.7.*
Boost 1.67
OpenCV 2.4.*

We recommend Anaconda to manage the version of your dependencies. For example:

     conda install boost=1.67.0

Installation

Build library:

    mkdir build
    cd build
    cmake -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF ..
    make

Copy the Augment.so to the target folder and follow demo.py to use the tool.

    cp Augment.so ..
    cd ..
    python demo.py

Demo

Distortion

Stretch

Perspective

Speed

To transform an image with size (H:64, W:200), it takes less than 3ms using a 2.0GHz CPU. It is possible to accelerate the process by calling multi-process batch samplers in an on-the-fly manner, such as setting "num_workers" in PyTorch.

Improvement for Recognition

We compare the accuracies of CRNN trained using only the corresponding small training set.

Dataset	IIIT5K	IC13	IC15
Without Data Augmentation	40.8%	6.8%	8.7%
With Data Augmentation	53.4%	9.6%	24.9%

Citation

@inproceedings{luo2020learn,
  author = {Canjie Luo and Yuanzhi Zhu and Lianwen Jin and Yongpan Wang},
  title = {Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition},
  booktitle = {CVPR},
  year = {2020}
}

@inproceedings{wang2020decoupled,
  author = {Tianwei Wang and Yuanzhi Zhu and Lianwen Jin and Canjie Luo and Xiaoxue Chen and Yaqiang Wu and Qianying Wang and Mingxiang Cai}, 
  title = {Decoupled attention network for text recognition}, 
  booktitle ={AAAI}, 
  year = {2020}
}

@article{schaefer2006image,
  title={Image deformation using moving least squares},
  author={Schaefer, Scott and McPhail, Travis and Warren, Joe},
  journal={ACM Transactions on Graphics (TOG)},
  volume={25},
  number={3},
  pages={533--540},
  year={2006},
  publisher={ACM New York, NY, USA}
}

Acknowledgment

Thanks for the contribution of the following developers.

@keeofkoo

@cxcxcxcx

@Yati Sagade

Attention

The tool is only free for academic research purposes.

Geometric Augmentation for Text Image

Related tags

Overview

Text Image Augmentation

News

Requirements

Installation

Demo

Speed

Improvement for Recognition

Citation

Acknowledgment

Attention

Owner

Canjie Luo

An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

M-LSDを用いて四角形を検出し、射影変換を行うサンプルプログラム

Library used to deskew a scanned document

Optical character recognition for Japanese text, with the main focus being Japanese manga

Official implementation of Character Region Awareness for Text Detection (CRAFT)

Educational application aimed at automating user-defined workflows for the mobile game, "Granblue Fantasy", using a variety of CV technologies in the backend such as OpenCV, PyAutoGUI and EasyOCR and a frontend coded in Typescript.

Driver Drowsiness Detection with OpenCV & Dlib

Here use convulation with sobel filter from scratch in opencv python .

This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"

This Repository contain Opencv Projects in python

Reference Code for AAAI-20 paper "Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels"

Satoshi is a discord bot template in python using discord.py that allow you to track some live crypto prices with your own discord bot.

Code for AAAI 2021 paper: Sequential End-to-end Network for Efficient Person Search

DouZero is a reinforcement learning framework for DouDizhu - 斗地主AI

Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

OCR of Chicago 1909 Renumbering Plan

Regions sanitàries (RS), Sectors Sanitàris (SS) i Àrees Bàsiques de Salut (ABS) de Catalunya

This repository contains codes on how to handle mouse event using OpenCV

Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)