Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Here is the code for ssbassline model. We also provide OCR results/features/models. The code is built on top of M4C, where more detailed information can also be found.

Citation

If you use ssbaseline in your work, please cite:

@article{zhu2020simple,
  title={Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps},
  author={Zhu, Qi and Gao, Chenyu and Wang, Peng and Wu, Qi},
  journal={arXiv preprint arXiv:2012.05153},
  year={2020}
}

Installation

First install the repo using

git clone https://github.com/ZephyrZhuQi/ssbaseline.git ~/ssbaseline
cd ~/ssbaseline
python setup.py build develop

Getting Data

We provide SBD-Trans OCR for TextVQA and ST-VQA datasets. The corresponding OCR Faster R-CNN features and Recog-CNN features are also released.

Datasets	ImDBs	Object Faster R-CNN Features	OCR Faster R-CNN Features	OCR Recog-CNN Features
TextVQA	TextVQA ImDB	Open Images	TextVQA SBD-Trans OCRs	TextVQA SBD-Trans OCRs
ST-VQA	ST-VQA ImDB	ST-VQA Objects	ST-VQA SBD-Trans OCRs	ST-VQA SBD-Trans OCRs

Pretrained Models

We release the following pretrained models for ssbaseline on TextVQA.

For the TextVQA dataset, we release: ssbaseline trained with ST-VQA as additional data (our best model) with SBD-Trans.

Datasets	Config Files (under `configs/vqa/`)	Pretrained Models	Metrics	Notes
TextVQA (`m4c_textvqa`)	`m4c_textvqa/m4c_with_stvqa.yml`	`ssbaseline_with_stvqa`	val accuracy - 45.53%; test accuracy - 45.66%	SBD-Trans OCRs; ST-VQA as additional data

Training and Evaluation

Please follow the M4C README for the training and evaluation of the M4C model on each dataset.

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Related tags

Overview

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Citation

Installation

Getting Data

Pretrained Models

Training and Evaluation

Owner

ZephyrZhuQi

scAR (single-cell Ambient Remover) is a package for data denoising in single-cell omics.

A LiDAR point cloud cluster for panoptic segmentation

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

A PyTorch-based R-YOLOv4 implementation which combines YOLOv4 model and loss function from R3Det for arbitrary oriented object detection.

Various operations like path tracking, counting, etc by using yolov5

Epidemiology analysis package

TuckER: Tensor Factorization for Knowledge Graph Completion

Implementation of RegretNet with Pytorch

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Official repo for QHack—the quantum machine learning hackathon

This project is used for the paper Differentiable Programming of Isometric Tensor Network

The code of paper 'Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection'

“Robust Lightweight Facial Expression Recognition Network with Label Distribution Training”, AAAI 2021.

Pytorch and Torch testing code of CartoonGAN

Implementation of Feedback Transformer in Pytorch

Code for the paper "Attention Approximates Sparse Distributed Memory"

PyTorch implementations of the NeRF model described in "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis"

Implementation of TransGanFormer, an all-attention GAN that combines the finding from the recent GanFormer and TransGan paper

Meta-meta-learning with evolution and plasticity