A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

Last update: Dec 05, 2022

Related tags

Deep Learning onestage_grounding

Overview

One-Stage Visual Grounding

***** New: Our recent work on One-stage VG is available at ReSC.*****

A Fast and Accurate One-Stage Approach to Visual Grounding

by Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, and Jiebo Luo

IEEE International Conference on Computer Vision (ICCV), 2019, Oral

Introduction

We propose a simple, fast, and accurate one-stage approach to visual grounding. For more details, please refer to our paper.

Citation

@inproceedings{yang2019fast,
  title={A Fast and Accurate One-Stage Approach to Visual Grounding},
  author={Yang, Zhengyuan and Gong, Boqing and Wang, Liwei and Huang
    , Wenbing and Yu, Dong and Luo, Jiebo},
  booktitle={ICCV},
  year={2019}
}

Prerequisites

Python 3.5 (3.6 tested)
Pytorch 0.4.1
Others (Pytorch-Bert, OpenCV, Matplotlib, scipy, etc.)

Installation

Clone the repository

git clone https://github.com/zyang-ur/onestage_grounding.git

Prepare the submodules and associated data

RefCOCO & ReferItGame Dataset: place the data or the soft link of dataset folder under ./ln_data/. We follow dataset structure DMS. To accomplish this, the download_dataset.sh bash script from DMS can be used.
```
bash ln_data/download_data.sh --path ./ln_data
```

Flickr30K Entities Dataset: please download the images for the dataset on the website for the Flickr30K Entities Dataset and the original Flickr30k Dataset. Images should be placed under ./ln_data/Flickr30k/flickr30k_images.
Data index: download the generated index files and place them as the ./data folder. Availble at [Gdrive], [One Drive].
```
rm -r data
tar xf data.tar
```
Model weights: download the pretrained model of Yolov3 and place the file in ./saved_models.
```
sh saved_models/yolov3_weights.sh
```

More pretrained models are availble in the performance table [Gdrive], [One Drive] and should also be placed in ./saved_models.

Training

Train the model, run the code under main folder. Using flag --lstm to access lstm encoder, Bert is used as the default. Using flag --light to access the light model.

python train_yolo.py --data_root ./ln_data/ --dataset referit \
  --gpu gpu_id --batch_size 32 --resume saved_models/lstm_referit_model.pth.tar \
  --lr 1e-4 --nb_epoch 100 --lstm

Evaluate the model, run the code under main folder. Using flag --test to access test mode.

python train_yolo.py --data_root ./ln_data/ --dataset referit \
  --gpu gpu_id --resume saved_models/lstm_referit_model.pth.tar \
  --lstm --test

Visulizations. Flag --save_plot will save visulizations.

Performance and Pre-trained Models

Please check the detailed experiment settings in our paper.

Dataset	Ours-LSTM	Performance ([email protected])	Ours-Bert	Performance ([email protected])
ReferItGame	Gdrive	58.76	Gdrive	59.30
Flickr30K Entities	One Drive	67.62	One Drive	68.69
RefCOCO		val: 73.66		val: 72.05
		testA: 75.78		testA: 74.81
		testB: 71.32		testB: 67.59

Credits

Part of the code or models are from DMS, MAttNet, Yolov3 and Pytorch-yolov3.

A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

Related tags

Overview

One-Stage Visual Grounding

Introduction

Citation

Prerequisites

Installation

Training

Performance and Pre-trained Models

Credits

Owner

Zhengyuan Yang

Breast cancer is been classified into benign tumour and malignant tumour.

TensorFlow implementation of Style Transfer Generative Adversarial Networks: Learning to Play Chess Differently.

A Simple Framwork for CV Pre-training Model (SOCO, VirTex, BEiT)

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Vehicle detection using machine learning and computer vision techniques for Udacity's Self-Driving Car Engineer Nanodegree.

Pytorch version of SfmLearner from Tinghui Zhou et al.

Pretraining Representations For Data-Efficient Reinforcement Learning

Wordplay, an artificial Intelligence based crossword puzzle solver.

CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

A PyTorch implementation of the continual learning experiments with deep neural networks

Code for our TKDE paper "Understanding WeChat User Preferences and “Wow” Diffusion"

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

Stacs-ci - A set of modules to enable integration of STACS with commonly used CI / CD systems

Supplementary materials for ISMIR 2021 LBD paper "Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes"

tree-math: mathematical operations for JAX pytrees

Bayesian optimisation library developped by Huawei Noah's Ark Library

This code is part of the reproducibility package for the SANER 2022 paper "Generating Clarifying Questions for Query Refinement in Source Code Search".

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

Implementation of trRosetta and trDesign for Pytorch, made into a convenient package

(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds