TransVTSpotter: End-to-end Video Text Spotter with Transformer

Last update: Dec 26, 2022

Related tags

Deep Learning TransVTSpotter

Overview

TransVTSpotter: End-to-end Video Text Spotter with Transformer

Introduction

A Multilingual, Open World Video Text Dataset and End-to-end Video Text Spotter with Transformer

Link to our MOVText: A Large-Scale, Multilingual Open World Dataset for Video Text Spotting

Updates

(08/04/2021) Refactoring the code.
(10/20/2021) The complete code has been released .

ICDAR2015(video) Tracking challenge

Methods	MOTA	MOTP	IDF1	Mostly Matched	Partially Matched	Mostly Lost
TransVTSpotter	45.75	73.58	57.56	658	611	647

Notes

The training time is on 8 NVIDIA V100 GPUs with batchsize 16.
We use the models pre-trained on COCOTextV2.
We do not release the recognition code due to the company's regulations.

Demo

Installation

The codebases are built on top of Deformable DETR and TransTrack.

Requirements

Linux, CUDA>=9.2, GCC>=5.4
Python>=3.7
PyTorch ≥ 1.5 and torchvision that matches the PyTorch installation. You can install them together at pytorch.org to make sure of this
OpenCV is optional and needed by demo and visualization

Steps

Install and build libs

git clone [email protected]:weijiawu/TransVTSpotter.git
cd TransVTSpotter
cd models/ops
python setup.py build install
cd ../..
pip install -r requirements.txt

Prepare datasets and annotations

# pretrain COCOTextV2
python3 track_tools/convert_COCOText_to_coco.py

# ICDAR15
python3 track_tools/convert_ICDAR15video_to_coco.py

COCOTextV2 dataset is available in COCOTextV2.

python3 track_tools/convert_crowdhuman_to_coco.py

ICDAR2015 dataset is available in icdar2015.

python3 track_tools/convert_mot_to_coco.py

Pre-train on COCOTextV2

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_track.py  --output_dir ./output/Pretrain_COCOTextV2 --dataset_file pretrain --coco_path ./Data/COCOTextV2 --batch_size 2  --with_box_refine --num_queries 500 --epochs 300 --lr_drop 100 --resume ./output/Pretrain_COCOTextV2/checkpoint.pth

python3 track_tools/Pretrain_model_to_mot.py

The pre-trained model is available Baidu Netdisk， password:59w8. Google Netdisk

And the MOTA 44% can be found here password:xnlw. Google Netdisk

Train TransVTSpotter

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_track.py  --output_dir ./output/ICDAR15 --dataset_file text --coco_path ./Data/ICDAR2015_video --batch_size 2  --with_box_refine  --num_queries 300 --epochs 80 --lr_drop 40 --resume ./output/Pretrain_COCOTextV2/pretrain_coco.pth

Visualize TransVTSpotter

python3 track_tools/Evaluation_ICDAR15_video/vis_tracking.py

License

TransVTSpotter is released under MIT License.

Citing

If you use TranVTSpotter in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:

@article{wu2021opentext,
  title={A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer},
  author={Weijia Wu, Debing Zhang, Yuanqiang Cai, Sibo Wang, Jiahong Li, Zhuang Li, Yejun Tang, Hong Zhou},
  journal={35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks},
  year={2021}
}

TransVTSpotter: End-to-end Video Text Spotter with Transformer

Related tags

Overview

TransVTSpotter: End-to-end Video Text Spotter with Transformer

Introduction

Updates

ICDAR2015(video) Tracking challenge

Notes

Demo

Installation

Requirements

Steps

License

Citing

Owner

weijiawu

Custom Implementation of Non-Deep Networks

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

An elaborate and exhaustive paper list for Named Entity Recognition (NER)

PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

Deep Learning Package based on TensorFlow

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

BiSeNet based on pytorch

The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

IOT: Instance-wise Layer Reordering for Transformer Structures

Measures input lag without dedicated hardware, performing motion detection on recorded or live video

a pytorch implementation of auto-punctuation learned character by character

3D mesh stylization driven by a text input in PyTorch

Efficient Deep Learning Systems course

Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

Breast cancer is been classified into benign tumour and malignant tumour.

Implementation of SegNet: A Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-Wise Labelling

Near-Duplicate Video Retrieval with Deep Metric Learning

The Illinois repository for Climatehack (https://climatehack.ai/). We won 1st place!

FwordCTF 2021 Infrastructure and Source code of Web/Bash challenges

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners