Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Last update: Jan 03, 2023

Related tags

Deep Learning SwinTextSpotter

Overview

SwinTextSpotter

This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022). The paper is available at this link.

We use the models pre-trained on ImageNet. The ImageNet pre-trained SwinTransformer backbone is obtained from SwinT_detectron2.

Models

SWINTS-swin-english-pretrain [config] | model_Google Drive | model_BaiduYun PW: 954t

SWINTS-swin-Total-Text [config] | model_Google Drive | model_BaiduYun PW: tf0i

SWINTS-swin-ctw [config] | model_Google Drive | model_BaiduYun PW: 4etq

SWINTS-swin-icdar2015 [config] | model_Google Drive | model_BaiduYun PW: 3n82

SWINTS-swin-ReCTS [config] | model_Google Drive | model_BaiduYun PW: a4be

SWINTS-swin-vintext [config] | model_Google Drive | model_BaiduYun PW: slmp

Installation

Python=3.8
PyTorch=1.8.0, torchvision=0.9.0, cudatoolkit=11.1
OpenCV for visualization

Steps

Install the repository (we recommend to use Anaconda for installation.)

conda create -n SWINTS python=3.8 -y
conda activate SWINTS
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install opencv-python
pip install scipy
pip install shapely
pip install rapidfuzz
pip install timm
pip install Polygon3
git clone https://github.com/mxin262/SwinTextSpotter.git
cd SwinTextSpotter
python setup.py build develop

dataset path

datasets
|_ totaltext
|  |_ train_images
|  |_ test_images
|  |_ totaltext_train.json
|  |_ weak_voc_new.txt
|  |_ weak_voc_pair_list.txt
|_ mlt2017
|  |_ train_images
|  |_ annotations/icdar_2017_mlt.json
.......

Downloaded images

ICDAR2017-MLT [image]
Syntext-150k:
- Part1: 94,723 [dataset]
- Part2: 54,327 [dataset]
ICDAR2015 [image]
ICDAR2013 [image]
Total-Text_train_images [image]
Total-Text_test_images [image]
ReCTs [images&label] PW: 2b4q
LSVT [images&label] PW: 9uh1
ArT [images&label] PW: 2865
SynChinese130k [images][label]
Vintext_images [image]

Downloaded label[Google Drive] [BaiduYun] PW: 46vd

Downloader lexicion[Google Drive] and place it to corresponding dataset.

You can also prepare your custom dataset following the example scripts. [example scripts]

Totaltext

To evaluate on Total Text, CTW1500, ICDAR2015, first download the zipped annotations with

cd datasets
mkdir evaluation
cd evaluation
wget -O gt_ctw1500.zip https://cloudstor.aarnet.edu.au/plus/s/xU3yeM3GnidiSTr/download
wget -O gt_totaltext.zip https://cloudstor.aarnet.edu.au/plus/s/SFHvin8BLUM4cNd/download
wget -O gt_icdar2015.zip https://drive.google.com/file/d/1wrq_-qIyb_8dhYVlDzLZTTajQzbic82Z/view?usp=sharing
wget -O gt_vintext.zip https://drive.google.com/file/d/11lNH0uKfWJ7Wc74PGshWCOgSxgEnUPEV/view?usp=sharing

Pretrain SWINTS (e.g., with Swin-Transformer backbone)

python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-pretrain.yaml

Fine-tune model on the mixed real dataset

python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-mixtrain.yaml

Fine-tune model

python projects/SWINTS/train_net.py \
  --num-gpus 8 \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml

Evaluate SWINTS (e.g., with Swin-Transformer backbone)

python projects/SWINTS/train_net.py \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
  --eval-only MODEL.WEIGHTS ./output/model_final.pth

Visualize the detection and recognition results (e.g., with ResNet50 backbone)

python demo/demo.py \
  --config-file projects/SWINTS/configs/SWINTS-swin-finetune-totaltext.yaml \
  --input input1.jpg \
  --output ./output \
  --confidence-threshold 0.4 \
  --opts MODEL.WEIGHTS ./output/model_final.pth

Example results:

Acknowlegement

Adelaidet, Detectron2, ISTR, SwinT_detectron2, Focal-Transformer and MaskTextSpotterV3.

Citation

If our paper helps your research, please cite it in your publications:

@article{huang2022swints,
  title = {SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition},
  author = {Mingxin Huang and YuLiang liu and Zhenghao Peng and Chongyu Liu and Dahua Lin and Shenggao Zhu and Nicholas Yuan and Kai Ding and Lianwen Jin},
  journal={arXiv preprint arXiv:2203.10209},
  year = {2022}
}

Copyright

For commercial purpose usage, please contact Dr. Lianwen Jin: [email protected]

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Related tags

Overview

SwinTextSpotter

Models

Installation

Steps

Totaltext

Example results:

Acknowlegement

Citation

Copyright

Owner

mxin262

A nutritional label for food for thought.

RoMA: Robust Model Adaptation for Offline Model-based Optimization

Easy-to-use,Modular and Extendible package of deep-learning based CTR models .

Official implementation of ACMMM'20 paper 'Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework'

Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild"

Extracts essential Mediapipe face landmarks and arranges them in a sequenced order.

UnpNet - Rethinking 3-D LiDAR Point Cloud Segmentation(IEEE TNNLS)

Image Captioning using CNN and Transformers

TensorFlow implementation of "Variational Inference with Normalizing Flows"

A Comprehensive Study on Learning-Based PE Malware Family Classification Methods

Feature extraction made simple with torchextractor

iBOT: Image BERT Pre-Training with Online Tokenizer

AttGAN: Facial Attribute Editing by Only Changing What You Want (IEEE TIP 2019)

A python code to convert Keras pre-trained weights to Pytorch version

IMBENS: class-imbalanced ensemble learning in Python.

Content shared at DS-OX Meetup

clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

PyTorch implementation of popular datasets and models in remote sensing

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation