A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

Overview

awesome-deep-text-detection-recognition

A curated list of awesome deep learning based papers on text detection and recognition.

Text Detection

  • Papers are sorted by published date.
  • IC is shorts for ICDAR.
  • Score is F1-score for localization task.
    • (L) stands for score in leader-board.
    • If the reported score in leader-board is somewhat different from the paper, (L) is provided.
  • *CODE means official code and CODE(M) means that traiend model is provided.
Conf. Date Title IC13 IC15 Resources
'14-ECCV 14/10/07 Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees
15-CVPR 15/06/01 Symmetry-based text line detection in natural scenes 0.8043 PRJ
CODE
'16-TIP 15/10/12 Text-Attentional Convolutional Neural Networks for Scene Text Detection 0.8165
'15-ICCV 15/12/13 Text Flow : A Unified Text Detection System in Natural Scene Images 0.8025
'16-arXiv 16/03/31 Accurate Text Localization in Natural Image with Cascaded Convolutional TextNetwork 0.86
'16-CVPR 16/04/14 Multi-Oriented Text Detection with Fully Convolutional Networks 0.83 0.54 *TORCH(M)
'16-CVPR 16/04/22 Synthetic Data for Text Localisation in Natural Images 0.847
(L)0.8359
CODE
DB
'16-arXiv 16/06/29 Scene Text Detection Via Holistic, Multi-Channel Prediction 0.8433 0.6477
'16-ECCV 16/09/12 Detecting Text in Natural Image with Connectionist Text Proposal Network 0.8215 0.6085 *CAFFE(M)
CAFFE
TF(M)
TF
DEMO
BLOG(CH)
'17-AAAI 16/11/21 TextBoxes: A fast text detector with a single deep neural network 0.85
(L)0.8767
*CAFFE(M)
TF
BLOG(KR)
'18-TM 17/03/03 Arbitrary-Oriented Scene Text Detection via Rotation Proposals 0.9125 0.8020 *CAFFE
'17-CVPR 17/03/04 Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection 0.7064
'17-CVPR 17/03/19 Detecting Oriented Text in Natural Images by Linking Segments 0.853 0.75
(L)0.7636
*TF(M)
TF(M)
SLIDE
VIDEO
'17-arXiv 17/03/24 Deep Direct Regression for Multi-Oriented Scene Text Detection 0.86 0.81
'17-arXiv 17/04/03 Cascaded Segmentation-Detection Networks for Word-Level Text Spotting 0.86 0.71
'17-CVPR 17/04/11 EAST: An Efficient and Accurate Scene Text Detector 0.8072
(L)0.8038
TF(M)
TF
PYTORCH(M)
PYTORCH
DEMO
KERAS(M)
VIDEO
'17-ICIP 17/05/15 WordFence: Text Detection in Natural Images with Border Awareness 0.86
'17-arXiv 17/06/30 R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection 0.8773 0.8254 TF(M)
CAFFE(M)
'17-CVPR 17/07/21 Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting In The Wild 0.85 0.63
'17-arXiv 17/08/17 Deep Scene Text Detection with Connected Component Proposals 0.919
'17-ICCV 17/08/22 WordSup: Exploiting Word Annotations for Character based Text Detection 0.9064 0.7816
'17-ICCV 17/09/01 Single Shot Text Detector with Regional Attention 0.8704 0.7691 *CAFFE(M)
PYTORCH
VIDEO
'17-arXiv 17/09/11 Fused Text Segmentation Networks for Multi-oriented Scene Text Detection 0.8414
'17-ICCV 17/10/13 WeText: Scene Text Detection under Weak Supervision 0.869
(L)0.8313
'17-ICCV 17/10/22 Self-organized Text Detection with Minimal Post-processing via Border Learning 0.84 *KERAS(M)
'17-ICDAR 17/11/11 Deep Residual Text Detection Network for Scene Text 0.9117
(L)0.8925
'18-AAAI 17/11/12 Feature Enhancement Network: A Refined Scene Text Detector 0.9161
'17-arXiv 17/11/30 ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene 0.759
'18-AAAI 18/01/04 PixelLink: Detecting Scene Text via Instance Segmentation 0.881 0.8519 *TF(M) TF
'18-CVPR 18/01/05 FOTS: Fast Oriented Text Spotting with a Unified Network 0.925 0.8984 PYTORCH
PYTORCH
VIDEO
'18-TIP 18/01/09 TextBoxes++: A Single-Shot Oriented Scene Text Detector 0.88 0.829
(L)0.8475
*CAFFE(M)
'18-CVPR 18/02/27 Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation 0.88 0.843 *PYTORCH(M)
'18-CVPR 18/03/09 An end-to-end TextSpotter with Explicit Alighment and Attention 0.9 0.87 *CAFFE(M)
'18-CVPR 18/03/14 Rotation-Sensitive Regression for Oriented Scene Text Detection 0.89 0.838 *CAFFE(M)
'18-arXiv 18/04/08 Detecting Multi-Oriented Text with Corner-based Region Proposals 0.876 0.845 *CAFFE(M)
'18-arXiv 18/04/24 An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches 0.92 0.86
'18-IJCAI 18/05/03 IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection 0.9047
'18-arXiv 18/06/07 Shape Robust Text Detection with Progressive Scale Expansion Network 0.8721 PRJ
'18-ECCV 18/07/04 TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes 0.826 PYTORCH
'18-ECCV 18/07/06 Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes 0.917 0.86
'18-ECCV 18/07/10 Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping 0.892
'19-AAAI 18/11/21 Scene Text Detection with Supervised Pyramid Context Network 0.921 0.872
'19-TIP 18/12/04 TextField: Learning A Deep Direction Field for Irregular Scene Text Detection 0.824 *CAFFE(M)
'19-CVPR 19/03/21 Towards Robust Curve Text Detection with Conditional Spatial Expansion
'19-CVPR 19/03/28 Shape Robust Text Detection with Progressive Scale Expansion Network 0.857 TF(M)
'19-CVPR 19/04/03 Character Region Awareness for Text Detection 0.952 0.869 *PYTORCH(M)
VIDEO
PYTORCH
TF(M)
KERAS
BLOG_CH
BLOG_KR
BLOG_KR
BLOG_KR
'19-CVPR 19/04/13 Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes Screen reader support enabled 0.877
'19-CVPR 19/06/16 Learning Shape-Aware Embedding for Scene Text Detection 0.877
'19-CVPR 19/06/16 Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation 0.917 0.876
'19-ICCV 19/08/16 Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network 0.829
'19-ICCV 19/09/02 Geometry Normalization Networks for Accurate Scene Text Detection 0.8852
'19-AAAI 19/11/20 Real-time Scene Text Detection with Differentiable Binarization 0.847

Text Recognition

  • Papers are sorted by published date.
  • IC is shorts for ICDAR.
  • Score is word-accuracy for recognition task.
    • For results on IC03, IC13, and IC15 dataset, papers used different numbers of samples per paper,
      but we did not distinguish between them
  • *CODE means official code and CODE(M) means that trained model is provided.
Conf. Date Title SVT IIIT5k IC03 IC13 Resources
'15-ICLR 14/12/18 Deep structured output learning for unconstrained text recognition 0.717 0.896 0.818 TF
SLIDE
VIDEO
'16-IJCV 15/05/07 Reading text in the wild with convolutional neural networks 0.807 0.933 0.908 KERAS
'16-AAAI 15/06/14 Reading Scene Text in Deep Convolutional Sequences
'17-TPAMI 15/07/21 An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition 0.808 0.782 0.894 0.867 TORCH(M)
TF
TF
TF
TF
PYTORCH
PYTORCH(M)
BLOG(KR)
'16-CVPR 16/03/09 Recursive Recurrent Nets with Attention Modeling for OCR in the Wild 0.807 0.784 0.887 0.9
'16-CVPR 16/03/12 Robust scene text recognition with automatic rectification 0.819 0.819 0.901 0.886 PYTORCH
PYTORCH
'16-CVPR 16/06/27 CNN-N-Gram for Handwriting Word Recognition 0.8362 VIDEO
'16-BMVC 16/09/19 STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition 0.836 0.833 0.899 0.891
'17-arXiv 17/07/27 STN-OCR: A single Neural Network for Text Detection and Text Recognition 0.798 0.86 0.903 *MXNET(M)
PRJ
BLOG
'17-IJCAI 17/08/19 Learning to Read Irregular Text with Attention Mechanisms
'17-arXiv 17/09/06 Scene Text Recognition with Sliding Convolutional Character Models 0.765 0.816 0.845 0.852
'17-ICCV 17/09/07 Focusing Attention: Towards Accurate Text Recognition in Natural Images 0.859 0.874 0.942 0.933
'18-CVPR 17/11/12 AON: Towards Arbitrarily-Oriented Text Recognition 0.828 0.87 0.915 TF
'17-NIPS 17/12/04 Gated Recurrent Convolution Neural Network for OCR 0.815 0.808 0.978 *TORCH(M)
'18-AAAI 18/01/04 Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition 0.844 0.836 0.915 0.908
'18-AAAI 18/01/04 SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network 0.87 0.931 0.929
'18-CVPR 18/05/09 Edit Probability for Scene Text Recognition 0.875 0.883 0.946 0.944
'18-TPAMI 18/06/25 ASTER: An Attentional Scene Text Recognizer with Flexible Rectification 0.936 0.934 0.945 0.918 *TF(M)
PYTORCH
'18-ECCV 18/09/08 Synthetically Supervised Feature Learning for Scene Text Recognition 0.871 0.894 0.947 0.94
'19-AAAI 18/09/18 Scene Text Recognition from Two-Dimensional Perspective 0.821 0.92 0.914
'19-AAAI 18/11/02 Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition 0.845 0.915 0.91 *TORCH(M)
'19-CVPR 18/12/14 ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification 0.902 0.933 0.913 PRJ
'19-PR 19/01/10 MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition 0.883 0.912 0.950 0.924 *PYTORCH(M)
'19-ICCV 19/04/03 What is wrong with scene text recognition model comparisons? dataset and model analysis 0.875 0.949 0.936 *PYTORCH(M)
BLOG_KR
'19-CVPR 19/04/18 Aggregation Cross-Entropy for Sequence Recognition 0.826 0.823 0.921 0.897 *PYTORCH
'19-CVPR 19/06/16 Sequence-to-Sequence Domain Adaptation Network for Robust Text Image Recognition 0.845 0.838 0.921 0.918
'19-ICCV 19/08/06 Symmetry-constrained Rectification Network for Scene Text Recognition 0.889 0.944 0.95 0.939
'20-AAAI 19/12/28 TextScanner: Reading Characters in Order for Robust Scene Text Recognition 0.895 0.926 0.925
'20-AAAI 19/12/21 Decoupled Attention Network for Text Recognition 0.892 0.943 0.95 0.939 *PYTORCH(M)
'20-AAAI 20/02/04 GTC: Guided Training of CTC 0.929 0.955 0.952 0.943

End-to-End Text Recognition

  • Papers are sorted by published date.
  • IC is shorts for ICDAR.
  • Score is F1-score for generic task.
  • *CODE means official code and CODE(M) means that trained model is provided.
Conf. Date Title IC03 IC13 IC15 Resources
'12-ICPR 12/11/11 End-to-end text recognition with convolutional neural networks 0.67 *CODE
'14-ECCV 14/09/06 Deep Features for Text Spotting 0.75 PRJ
MATLAB
'15-IJCV 15/05/07 Reading Text in the Wild with Convolutional Neural Networks 0.70 0.77 KERAS
'15-TPAMI 15/10/30 Real-time Lexicon-free Scene Text Localization and Recognition 0.542 0.156
'16-arXiv 16/04/10 TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild 0.6843 0.4718
(L)0.533
*CAFFE(M)
'17-AAAI 16/11/21 TextBoxes: A fast text detector with a single deep neural network 0.84 TF
*CAFFE(M)
BLOG_KR
'17-ICCV 17/07/13 Towards End-to-end Text Spotting with Convolution Recurrent Neural Network 0.8459 VIDEO
'17-ICCV 17/10/22 Deep TextSpotter An End-to-End Trainable Scene Text Localization and Recognition Framework 0.77 0.47 VIDEO
*CAFFE(M)
'18-CVPR 18/01/05 FOTS: Fast Oriented Text Spotting with a Unified Network 0.8477 0.6533 VIDEO
TF(M)
'18-TIP 18/01/09 TextBoxes++: A Single-Shot Oriented Scene Text Detector 0.8465 0.519 *CAFFE(M)
'18-CVPR 18/03/09 An end-to-end TextSpotter with Explicit Alignment and Attention 0.86 0.63 *CAFFE(M)
'18-TPAMI 18/06/25 ASTER: An Attentional Scene Text Recognizer with Flexible Rectification 0.64 *TF(M)
'18-ECCV 18/07/06 Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes 0.865 0.624
'19-ICCV 19/08/24 Towards Unconstrained End-to-End Text Spotting 0.6994 BLOG_KR
'19-ICCV 19/10/17 Convolutional Character Networks 0.7108 *PYTORCH(M)
'19-ICCV 19/10/27 TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting 0.6537
'20-AAAI 19/11/21 All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting 0.841 0.641
'20-AAAI 20/02/12 Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting 0.858 0.651

Others

  • Papers are sorted by published date.
  • *CODE means official code and CODE(M) means that trained model is provided.
Conf. Date Title Description Resources
'14-NIPS 14/06/09 Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition Dataset PRJ
'17-ECCV 17/02/13 End-to-End Interpretation of the French Street Name Signs Dataset Dataset (FSNS) *TF(M)
'17-arXiv 17/04/11 Attention-based Extraction of Structured Information from Street View Imagery FSNS *TF(M)
TF
TF
LUA
BLOG_KR
'17-CVPR 17/07/21 Unambiguous Text Localization and Retrieval for Cluttered Scenes Text Retrieval
'17-AAAI 17/10/22 Detection and Recognition of Text Embedded in Online Images via Neural Context Models Dataset PRJ
'18-CVPR 17/11/17 Separating Style and Content for Generalized Style Transfer Font Style
'17-arXiv 17/12/06 Detecting Curve Text in the Wild New Dataset and New Solution Dataset (CTW 1500) PRJ
'18-AAAI 17/12/14 SEE: Towards Semi-Supervised End-to-End Scene Text Recognition FSNS PRJ
*CHAINER(M)
'17-CVPR 18/06/07 Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks Document Layout PRJ
'18-CVPR 18/06/19 DocUNet: Document Image Unwarping via A Stacked U-Net Document Dewarping PRJ
'18-CVPR 18/06/19 Document Enhancement using Visibility Detection Document Enhancement PRJ
'18-IJCAI 18/06/22 Multi-Task Handwritten Document Layout Analysis Document Layout
'18-ECCV 18/07/09 Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes Dataset PRJ
'19-AAAI 18/12/03 EnsNet: Ensconce Text in the Wild Text Removal DB
'19-CVPR 18/12/14 Spatial Fusion GAN for Image Synthesis Dataset DB
'19-AAAI 19/01/27 Hierarchical Encoder with Auxiliary Supervision for Table-to-text Generation: Learning Better Representation for Tables TableToText
'19-AAAI 19/01/27 A Radical-aware Attention-based Model for Chinese Text Classification Chinese Character Classification
'19-CVPR 19/02/25 Handwriting Recognition in Low-resource Scripts using Adversarial Learning Handwritting Recognition TF
'19-CVPR 19/03/27 Tightness-aware Evaluation Protocol for Scene Text Detection Evaluation CODE
'19-ICCV 19/05/31 Scene Text Visual Question Answering Dataset ICDAR_DB
'19-CVPR 19/06/16 DynTypo: Example-based Dynamic Text Effects Transfer Text Effects PRJ
VIDEO
'19-CVPR 19/06/16 Typography with Decor: Intelligent Text Style Transfer Text Effects *PYTORCH(M)
'19-CVPR 19/06/16 An Alternative Deep Feature Approach to Line Level Keyword Spotting Kyeword Spotting
'19-ICCV 19/07/23 GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition Domain Adaptation
'19-ICCV 19/09/17 Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning Dataset ICDAR_DB
'19-ICCV 19/10/02 Large-scale Tag-based Font Retrieval with Generative Feature Learning Font Retrieval
'19-ICCV 19/10/27 TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts Place Recognition DB
'19-ICCV 19/10/27 DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks Document Dewarping *PYTORCH(M)

Other lists

Tutorial Materials

Acknowledgment

  • This work is done by OCR team in Clova AI powered by NAVER-LINE. NAVER-LINE is an Asian top internet company and develops Clova, a cloud-based AI-assistant platform.
  • This repository is scheduled to be updated regularly in accordance with schedules of major AI conferences.
Handwritten Text Recognition (HTR) using TensorFlow 2.x

Handwritten Text Recognition (HTR) system implemented using TensorFlow 2.x and trained on the Bentham/IAM/Rimes/Saint Gall/Washington offline HTR data

Arthur Flôr 160 Dec 21, 2022
Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"

Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching This repository is an official implementation of

HKUST-KnowComp 13 Sep 08, 2022
POT : Python Optimal Transport

This open source Python library provide several solvers for optimization problems related to Optimal Transport for signal, image processing and machine learning.

Python Optimal Transport 1.7k Jan 04, 2023
Image processing using OpenCv

Image processing using OpenCv Write a program that opens the webcam, and the user selects one of the following on the video: ✅ If the user presses the

M.Najafi 4 Feb 18, 2022
A Screen Translator/OCR Translator made by using Python and Tesseract, the user interface are made using Tkinter. All code written in python.

About An OCR translator tool. Made by me by utilizing Tesseract, compiled to .exe using pyinstaller. I made this program to learn more about python. I

Fauzan F A 41 Dec 30, 2022
A curated list of papers, code and resources pertaining to image composition

A curated list of resources including papers, datasets, and relevant links pertaining to image composition.

BCMI 391 Dec 30, 2022
Um RPG de texto orientado a objetos.

RPG de texto Um RPG de texto orientado a objetos, sem história. Um RPG (Role-playing game) baseado em texto em que você pode viajar para alguns locais

Vinicius 3 Oct 05, 2022
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

ocr-fileformat Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader) Installation Docker System-wide Usage CLI GUI API Transf

Universitätsbibliothek Mannheim 152 Dec 20, 2022
The virtual calculator will be above the live streaming from your camera

The virtual calculator is above the live streaming from my camera usb , the program first detect my hand and in each frame calculate the distance between two finger ,if the distance is lower than the

gasbaoui mohammed al amine 5 Jul 01, 2022
Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

Zipeng Xu 34 Nov 28, 2022
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for

Pascal Fischer 178 Dec 27, 2022
Fun program to overlay a mask to yourself using a webcam

Superhero Mask Overlay Description Simple project made for fun. It consists of placing a mask (a PNG image with transparent background) on your face.

KB Kwan 10 Dec 01, 2022
document image degradation

ocrodeg The ocrodeg package is a small Python library implementing document image degradation for data augmentation for handwriting recognition and OC

NVIDIA Research Projects 134 Nov 18, 2022
Provides OCR (Optical Character Recognition) services through web applications

OCR4all As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety

174 Dec 31, 2022
An interactive document scanner built in Python using OpenCV

The scanner takes a poorly scanned image, finds the corners of the document, applies the perspective transformation to get a top-down view of the document, sharpens the image, and applies an adaptive

Kushal Shingote 1 Feb 12, 2022
Using python libraries to track hands

Python-HandTracking Using python libraries to track hands on a camera Uses cv2 and mediapipe libraries custom hand tracking module PyCharm IDE Final E

Martin Matsudaira 1 Dec 17, 2021
Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Paper source Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 News We update RRPN in pytorch 1.0! View

428 Nov 22, 2022
How to detect objects in real time by using Jupyter Notebook and Neural Networks , by using Yolo3

Real Time Object Recognition From your Screen Desktop . In this post, I will explain how to build a simply program to detect objects from you desktop

Ruslan Magana Vsevolodovna 2 Sep 28, 2022