Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

Overview

Box_Discretization_Network

This repository is built on the pytorch [maskrcnn_benchmark]. The method is the foundation of our ReCTs-competition method [link], which won the championship.

PPT link [Google Drive][Baidu Cloud]

Generate your own JSON: [Google Drive][Baidu Cloud]

Brief introduction (in Chinese): [Google Drive][Baidu Cloud]

Competition related

Competition model and config files (it needs a lot of video memory):

  • Paper [Link] (Exploring the Capacity of Sequential-free Box Discretization Networkfor Omnidirectional Scene Text Detection)

  • Config file [BaiduYun Link]. Models below all use this config file except directory. Results below are the multi-scale ensemble results. The very details are described in our updated paper.

  • MLT 2017 Model [BaiduYun Link].

MLT 2017 Recall Precision Hmean
new 76.44 82.75 79.47
ReCTS Detection Recall Precision Hmean
new 93.97 92.76 93.36
HRSC_2016 Recall Precision Hmean TIoU-Hmean AP
IJCAI version 94.8 46.0 61.96 51.1 93.7
new 94.1 83.8 88.65 73.3 89.22
  • Online demo is updating (the old demo version used a wrong configuration). This demo uses the MLT model provided above. It can detect multi-lingual text but can only recognize English, Chinese, and most of the symbols.

Description

Please see our paper at [link].

The advantages:

  • BDN can directly produce compact quadrilateral detection box. (segmentation-based methods need additional steps to group pixels & such steps usually sensitive to outliers)
  • BDN can avoid label confusion (non-segmentation-based methods are mostly sensitive to label sequence, which can significantly undermine the detection result). Comparison on ICDAR 2015 dataset showing different methods’ ability of resistant to the label confusion issue (by adding rotated pseudo samples). Textboxes++, East, and CTD are all Sesitive-to-Label-Sequence methods.
Textboxes++ [code] East [code] CTD [code] Ours
Variances (Hmean) ↓ 9.7% ↓ 13.7% ↓ 24.6% ↑ 0.3%

Getting Started

A basic example for training and testing. This mini example offers a pure baseline that takes less than 4 hours (with 4 1080 ti) to finalize training with only official training data.

Install anaconda

Link:https://pan.baidu.com/s/1TGy6O3LBHGQFzC20yJo8tg psw:vggx

Step-by-step install

conda create --name mb
conda activate mb
conda install ipython
pip install ninja yacs cython matplotlib tqdm scipy shapely
conda install pytorch=1.0 torchvision=0.2 cudatoolkit=9.0 -c pytorch
conda install -c menpo opencv
export INSTALL_DIR=$PWD
cd $INSTALL_DIR
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install
cd $INSTALL_DIR
git clone https://github.com/Yuliang-Liu/Box_Discretization_Network.git
cd Box_Discretization_Network
python setup.py build develop
  • MUST USE torchvision=0.2

Pretrained model:

[Link] unzip under project_root

(This is ONLY an ImageNet Model With a few iterations on ic15 training data for a stable initialization)

ic15 data

Prepare data follow COCO format. [Link] unzip under datasets/

Train

After downloading data and pretrained model, run

bash quick_train_guide.sh

Test with [TIoU]

Run

bash my_test.sh

Put kes.json to ic15_TIoU_metric/ inside ic15_TIoU_metric/

Run (conda deactivate; pip install Polygon2)

python2 to_eval.py

Example results:

  • mask branch 79.4 (test segm.json by changing to_eval.py (line 10: mode=0) );
  • kes branch 80.4;
  • in .yaml, set RESCORING=True -> 80.8;
  • Set RESCORING=True and RESCORING_GAMA=0.8 -> 81.0;
  • One can try many other tricks such as CROP_PROB_TRAIN, ROTATE_PROB_TRAIN, USE_DEFORMABLE, DEFORMABLE_PSROIPOOLING, PNMS, MSR, PAN in the project, whcih were all tested effective to improve the results. To achieve state-of-the-art performance, extra data (syntext, MLT, etc.) and proper training strategies are necessary.

Visualization

Run

bash single_image_demo.sh

Citation

If you find our method useful for your reserach, please cite

@article{liu2019omnidirectional,
  title={Omnidirectional Scene Text Detection with Sequential-free Box Discretization},
  author={Liu, Yuliang and Zhang, Sheng and Jin, Lianwen and Xie, Lele and Wu, Yaqiang and Wang, Zhepeng},
  journal={IJCAI},
  year={2019}
}
@article{liu2019exploring,
  title={Exploring the Capacity of Sequential-free Box Discretization Network for Omnidirectional Scene Text Detection},
  author={Liu, Yuliang and He, Tong and Chen, Hao and Wang, Xinyu and Luo, Canjie and Zhang, Shuaitao and Shen, Chunhua and Jin, Lianwen},
  journal={arXiv preprint arXiv:1912.09629},
  year={2019}
}

Feedback

Suggestions and discussions are greatly welcome. Please contact the authors by sending email to [email protected] or [email protected]. For commercial usage, please contact Prof. Lianwen Jin via [email protected].

Owner
Yuliang Liu
MMLab; South China University of Technology; University of Adelaide
Yuliang Liu
A library to inspect itermediate layers of PyTorch models.

A library to inspect itermediate layers of PyTorch models. Why? It's often the case that we want to inspect intermediate layers of a model without mod

archinet.ai 380 Dec 28, 2022
NeurIPS 2021 paper 'Representation Learning on Spatial Networks' code

Representation Learning on Spatial Networks This repository is the official implementation of Representation Learning on Spatial Networks. Training Ex

13 Dec 29, 2022
PyTorch implementation of Barlow Twins.

Barlow Twins: Self-Supervised Learning via Redundancy Reduction PyTorch implementation of Barlow Twins. @article{zbontar2021barlow, title={Barlow Tw

Facebook Research 839 Dec 29, 2022
Moment-DETR code and QVHighlights dataset

Moment-DETR QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries Jie Lei, Tamara L. Berg, Mohit Bansal For dataset de

Jie Lei 雷杰 133 Dec 22, 2022
Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation Efficient Self-Ensemble Framework for Semantic Segmentation by Walid Bousselham

61 Dec 26, 2022
Self-Supervised Generative Style Transfer for One-Shot Medical Image Segmentation

Self-Supervised Generative Style Transfer for One-Shot Medical Image Segmentation This repository contains the Pytorch implementation of the proposed

Devavrat Tomar 19 Nov 10, 2022
AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video actio

Data Analytics Lab at Texas A&M University 267 Dec 17, 2022
AsymmetricGAN - Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

AsymmetricGAN for Image-to-Image Translation AsymmetricGAN Framework for Multi-Domain Image-to-Image Translation AsymmetricGAN Framework for Hand Gest

Hao Tang 42 Jan 15, 2022
PyTorch Implement of Context Encoders: Feature Learning by Inpainting

Context Encoders: Feature Learning by Inpainting This is the Pytorch implement of CVPR 2016 paper on Context Encoders 1) Semantic Inpainting Demo Inst

321 Dec 25, 2022
Efficient and Scalable Physics-Informed Deep Learning and Scientific Machine Learning on top of Tensorflow for multi-worker distributed computing

Notice: Support for Python 3.6 will be dropped in v.0.2.1, please plan accordingly! Efficient and Scalable Physics-Informed Deep Learning Collocation-

tensordiffeq 74 Dec 09, 2022
A script helps the user to update Linux and Mac systems through the terminal

Description This script helps the user to update Linux and Mac systems through the terminal. All the user has to install some requirements and then ru

Roxcoder 2 Jan 23, 2022
The Official PyTorch Implementation of "LSGM: Score-based Generative Modeling in Latent Space" (NeurIPS 2021)

The Official PyTorch Implementation of "LSGM: Score-based Generative Modeling in Latent Space" (NeurIPS 2021) Arash Vahdat*   ·   Karsten Kreis*   ·  

NVIDIA Research Projects 238 Jan 02, 2023
Official implementation of Rich Semantics Improve Few-Shot Learning (BMVC, 2021)

Rich Semantics Improve Few-Shot Learning Paper Link Abstract : Human learning benefits from multi-modal inputs that often appear as rich semantics (e.

Mohamed Afham 11 Jul 26, 2022
Model Zoo for MindSpore

Welcome to the Model Zoo for MindSpore In order to facilitate developers to enjoy the benefits of MindSpore framework, we will continue to add typical

MindSpore 226 Jan 07, 2023
This repository contains the code for the binaural-detection model used in the publication arXiv:2111.04637

This repository contains the code for the binaural-detection model used in the publication arXiv:2111.04637 Dependencies The model depends on the foll

Jörg Encke 2 Oct 14, 2022
Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods”

Uncertainty Estimation Methods Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods” Reference If you use this code,

EPFL Machine Learning and Optimization Laboratory 4 Apr 05, 2022
Layered Neural Atlases for Consistent Video Editing

Layered Neural Atlases for Consistent Video Editing Project Page | Paper This repository contains an implementation for the SIGGRAPH Asia 2021 paper L

Yoni Kasten 353 Dec 27, 2022
Neural network for recognizing the gender of people in photos

Neural Network For Gender Recognition How to test it? Install requirements.txt file using pip install -r requirements.txt command Run nn.py using pyth

Valery Chapman 1 Sep 18, 2022
piSTAR Lab is a modular platform built to make AI experimentation accessible and fun. (pistar.ai)

piSTAR Lab WARNING: This is an early release. Overview piSTAR Lab is a modular deep reinforcement learning platform built to make AI experimentation a

piSTAR Lab 0 Aug 01, 2022
Based on the given clinical dataset, Predict whether the patient having Heart Disease or Not having Heart Disease

Heart_Disease_Classification Based on the given clinical dataset, Predict whether the patient having Heart Disease or Not having Heart Disease Dataset

Ashish 1 Jan 30, 2022