Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

Overview

Box_Discretization_Network

This repository is built on the pytorch [maskrcnn_benchmark]. The method is the foundation of our ReCTs-competition method [link], which won the championship.

PPT link [Google Drive][Baidu Cloud]

Generate your own JSON: [Google Drive][Baidu Cloud]

Brief introduction (in Chinese): [Google Drive][Baidu Cloud]

Competition related

Competition model and config files (it needs a lot of video memory):

  • Paper [Link] (Exploring the Capacity of Sequential-free Box Discretization Networkfor Omnidirectional Scene Text Detection)

  • Config file [BaiduYun Link]. Models below all use this config file except directory. Results below are the multi-scale ensemble results. The very details are described in our updated paper.

  • MLT 2017 Model [BaiduYun Link].

MLT 2017 Recall Precision Hmean
new 76.44 82.75 79.47
ReCTS Detection Recall Precision Hmean
new 93.97 92.76 93.36
HRSC_2016 Recall Precision Hmean TIoU-Hmean AP
IJCAI version 94.8 46.0 61.96 51.1 93.7
new 94.1 83.8 88.65 73.3 89.22
  • Online demo is updating (the old demo version used a wrong configuration). This demo uses the MLT model provided above. It can detect multi-lingual text but can only recognize English, Chinese, and most of the symbols.

Description

Please see our paper at [link].

The advantages:

  • BDN can directly produce compact quadrilateral detection box. (segmentation-based methods need additional steps to group pixels & such steps usually sensitive to outliers)
  • BDN can avoid label confusion (non-segmentation-based methods are mostly sensitive to label sequence, which can significantly undermine the detection result). Comparison on ICDAR 2015 dataset showing different methods’ ability of resistant to the label confusion issue (by adding rotated pseudo samples). Textboxes++, East, and CTD are all Sesitive-to-Label-Sequence methods.
Textboxes++ [code] East [code] CTD [code] Ours
Variances (Hmean) ↓ 9.7% ↓ 13.7% ↓ 24.6% ↑ 0.3%

Getting Started

A basic example for training and testing. This mini example offers a pure baseline that takes less than 4 hours (with 4 1080 ti) to finalize training with only official training data.

Install anaconda

Link:https://pan.baidu.com/s/1TGy6O3LBHGQFzC20yJo8tg psw:vggx

Step-by-step install

conda create --name mb
conda activate mb
conda install ipython
pip install ninja yacs cython matplotlib tqdm scipy shapely
conda install pytorch=1.0 torchvision=0.2 cudatoolkit=9.0 -c pytorch
conda install -c menpo opencv
export INSTALL_DIR=$PWD
cd $INSTALL_DIR
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
python setup.py build_ext install
cd $INSTALL_DIR
git clone https://github.com/Yuliang-Liu/Box_Discretization_Network.git
cd Box_Discretization_Network
python setup.py build develop
  • MUST USE torchvision=0.2

Pretrained model:

[Link] unzip under project_root

(This is ONLY an ImageNet Model With a few iterations on ic15 training data for a stable initialization)

ic15 data

Prepare data follow COCO format. [Link] unzip under datasets/

Train

After downloading data and pretrained model, run

bash quick_train_guide.sh

Test with [TIoU]

Run

bash my_test.sh

Put kes.json to ic15_TIoU_metric/ inside ic15_TIoU_metric/

Run (conda deactivate; pip install Polygon2)

python2 to_eval.py

Example results:

  • mask branch 79.4 (test segm.json by changing to_eval.py (line 10: mode=0) );
  • kes branch 80.4;
  • in .yaml, set RESCORING=True -> 80.8;
  • Set RESCORING=True and RESCORING_GAMA=0.8 -> 81.0;
  • One can try many other tricks such as CROP_PROB_TRAIN, ROTATE_PROB_TRAIN, USE_DEFORMABLE, DEFORMABLE_PSROIPOOLING, PNMS, MSR, PAN in the project, whcih were all tested effective to improve the results. To achieve state-of-the-art performance, extra data (syntext, MLT, etc.) and proper training strategies are necessary.

Visualization

Run

bash single_image_demo.sh

Citation

If you find our method useful for your reserach, please cite

@article{liu2019omnidirectional,
  title={Omnidirectional Scene Text Detection with Sequential-free Box Discretization},
  author={Liu, Yuliang and Zhang, Sheng and Jin, Lianwen and Xie, Lele and Wu, Yaqiang and Wang, Zhepeng},
  journal={IJCAI},
  year={2019}
}
@article{liu2019exploring,
  title={Exploring the Capacity of Sequential-free Box Discretization Network for Omnidirectional Scene Text Detection},
  author={Liu, Yuliang and He, Tong and Chen, Hao and Wang, Xinyu and Luo, Canjie and Zhang, Shuaitao and Shen, Chunhua and Jin, Lianwen},
  journal={arXiv preprint arXiv:1912.09629},
  year={2019}
}

Feedback

Suggestions and discussions are greatly welcome. Please contact the authors by sending email to [email protected] or [email protected]. For commercial usage, please contact Prof. Lianwen Jin via [email protected].

Owner
Yuliang Liu
MMLab; South China University of Technology; University of Adelaide
Yuliang Liu
Transfer Learning Shootout for PyTorch's model zoo (torchvision)

pytorch-retraining Transfer Learning shootout for PyTorch's model zoo (torchvision). Load any pretrained model with custom final layer (num_classes) f

Alexander Hirner 169 Jun 29, 2022
Distributional Sliced-Wasserstein distance code

Distributional Sliced Wasserstein distance This is a pytorch implementation of the paper "Distributional Sliced-Wasserstein and Applications to Genera

VinAI Research 39 Jan 01, 2023
A package for music online and offline rhythmic information analysis including music Beat, downbeat, tempo and meter tracking.

BeatNet A package for music online and offline rhythmic information analysis including music Beat, downbeat, tempo and meter tracking. This repository

Mojtaba Heydari 157 Dec 27, 2022
A collection of SOTA Image Classification Models in PyTorch

A collection of SOTA Image Classification Models in PyTorch

sithu3 85 Dec 30, 2022
MaskTrackRCNN for video instance segmentation based on mmdetection

MaskTrackRCNN for video instance segmentation Introduction This repo serves as the official code release of the MaskTrackRCNN model for video instance

411 Jan 05, 2023
Source code for Transformer-based Multi-task Learning for Disaster Tweet Categorisation (UCD's participation in TREC-IS 2020A, 2020B and 2021A).

Source code for "UCD participation in TREC-IS 2020A, 2020B and 2021A". *** update at: 2021/05/25 This repo so far relates to the following work: Trans

Congcong Wang 4 Oct 19, 2021
neural image generation

pixray Pixray is an image generation system. It combines previous ideas including: Perception Engines which uses image augmentation and iteratively op

dribnet 398 Dec 17, 2022
Image Deblurring using Generative Adversarial Networks

DeblurGAN arXiv Paper Version Pytorch implementation of the paper DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. Our netwo

Orest Kupyn 2.2k Jan 01, 2023
A Python parser that takes the content of a text file and then reads it into variables.

Text-File-Parser A Python parser that takes the content of a text file and then reads into variables. Input.text File 1. What is your ***? 1. 18 -

Kelvin 0 Jul 26, 2021
A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion

A Conditional Point Diffusion-Refinement Paradigm for 3D Point Cloud Completion This repo intends to release code for our work: Zhaoyang Lyu*, Zhifeng

Zhaoyang Lyu 68 Jan 03, 2023
Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis

Readme File for "Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis" by Ham, Imai, and Janson. (2022) All scripts were written and

0 Jan 27, 2022
Cortex-compatible model server for Python and TensorFlow

Nucleus model server Nucleus is a model server for TensorFlow and generic Python models. It is compatible with Cortex clusters, Kubernetes clusters, a

Cortex Labs 14 Nov 27, 2022
Parris, the automated infrastructure setup tool for machine learning algorithms.

README Parris, the automated infrastructure setup tool for machine learning algorithms. What Is This Tool? Parris is a tool for automating the trainin

Joseph Greene 319 Aug 02, 2022
Video Frame Interpolation with Transformer (CVPR2022)

VFIformer Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer Dependencies python = 3.8 pytorch = 1.8.0

DV Lab 63 Dec 16, 2022
A collection of implementations of deep domain adaptation algorithms

Deep Transfer Learning on PyTorch This is a PyTorch library for deep transfer learning. We divide the code into two aspects: Single-source Unsupervise

Yongchun Zhu 647 Jan 03, 2023
Software Platform for solving and manipulating multiparametric programs in Python

PPOPT Python Parametric OPtimization Toolbox (PPOPT) is a software platform for solving and manipulating multiparametric programs in Python. This pack

10 Sep 13, 2022
A module for solving and visualizing Schrödinger equation.

qmsolve This is an attempt at making a solid, easy to use solver, capable of solving and visualize the Schrödinger equation for multiple particles, an

506 Dec 28, 2022
All the code and files related to the MI-Lab of UE19CS305 course in sem 5

Machine-Intelligence-Lab-CS305 The compilation of all the code an drelated files from MI-Lab UE19CS305 (of batch 2019-2023) offered by PES University

Arvind Krishna 3 Nov 10, 2022
A novel benchmark dataset for Monocular Layout prediction

AutoLay AutoLay: Benchmarking Monocular Layout Estimation Kaustubh Mani, N. Sai Shankar, J. Krishna Murthy, and K. Madhava Krishna Abstract In this pa

Kaustubh Mani 39 Apr 26, 2022
Libraries, tools and tasks created and used at DeepMind Robotics.

Libraries, tools and tasks created and used at DeepMind Robotics.

DeepMind 270 Nov 30, 2022