Recurrent Scale Approximation (RSA) for Object Detection

Last update: Dec 28, 2022

Related tags

Overview

Recurrent Scale Approximation (RSA) for Object Detection

Codebase for Recurrent Scale Approximation for Object Detection in CNN published at ICCV 2017, [arXiv]. Here we offer the training and test code for two modules in the paper, scale-forecast network and recurrent scale approximation (RSA). Models for face detection trained on some open datasets are also provided.

Note: This project is still underway. Please stay tuned for more features soon!

Codebase at a Glance

train/: Training code for modules scale-forecast network and RSA

predict/: Test code for the whole detection pipeline

afw_gtmiss.mat: Revised face data annotation mentioned in Section 4.1 in the paper.

Grab and Go (Demo)

Caffe models for face detection trained on popular datasets.

Base RPN model: predict/output/ResNet_3b_s16/tot_wometa_1epoch, trained on Widerface (fg/bg), COCO (bg only) and ImageNet Det (bg only)
RSA model: predict/output/ResNet_3b_s16_fm2fm_pool2_deep/65w, trained on Widerface, COCO, and ImageNet Det

Steps to run the test code:

Compile CaffeMex_v2 with matlab interface
Add CaffeMex_v2/matlab/ to matlab search path
See tips in predict/script_start.m and run it!
After processing for a few minutes, the detection and alignment results will be shown in an image window. Please click the image window to view all results. If you set line 8 in script_start.m to false as default, you should observe some results as above.

Train Your Own Model

Still in progress, this part will be released later.

FAQ

We will list the common issues of this project as time goes. Stay tuned! :)

Citation

Please kindly cite our work if it helps your research:

@inproceedings{liu_2017_rsa,
  Author = {Yu Liu and Hongyang Li and Junjie Yan and Fangyin Wei and Xiaogang Wang and Xiaoou Tang},
  Title = {Recurrent Scale Approximation for Object Detection in CNN},
  Journal = {IEEE International Conference on Computer Vision},
  Year = {2017}
}

Acknowledgment

We appreciate the contribution of the following researchers:

Dong Chen @Microsoft Research, some basic ideas are inspired by him when Yu Liu worked as an intern at MSR.

Jiongchao Jin @Beihang University, some baseline results are provided by him.

Recurrent Scale Approximation (RSA) for Object Detection

Related tags

Overview

Recurrent Scale Approximation (RSA) for Object Detection

Codebase at a Glance

Grab and Go (Demo)

Train Your Own Model

FAQ

Citation

Acknowledgment

Owner

Yu Liu (Louis)

FinEAS: Financial Embedding Analysis of Sentiment 📈

DeepFaceLab fork which provides IPython Notebook to use DFL with Google Colab

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration.

Repository for "Improving evidential deep learning via multi-task learning," published in AAAI2022

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

A new codebase for Group Activity Recognition. It contains codes for ICCV 2021 paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition and some other methods.

PyTorch implementation of SIFT descriptor

Multiband spectro-radiometric satellite image analysis with K-means cluster algorithm

ESL: Event-based Structured Light

Recursive Bayesian Networks

Computer Vision and Pattern Recognition, NUS CS4243, 2022

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance

3ds-Ghidra-Scripts - Ghidra scripts to help with 3ds reverse engineering

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.