LQM - Improving Object Detection by Estimating Bounding Box Quality Accurately

Last update: Sep 28, 2022

Related tags

Overview

Improving Object Detection by Estimating Bounding Box Quality Accurately

Abstract

Object detection aims to locate and classify object instances in images. Therefore, the object detection model is generally implemented with two parallel branches to optimize localization and classification. After training the detection model, we should select the best bounding box of each class among a number of estimations for reliable inference. Generally, NMS (Non Maximum Suppression) is operated to suppress low-quality bounding boxes by referring to classification scores or center-ness scores. However, since the quality of bounding boxes is not considered, the low-quality bounding boxes can be accidentally selected as a positive bounding box for the corresponding class. We believe that this misalignment between two parallel tasks causes degrading of the object detection performance. In this paper, we propose a method to estimate bounding boxes' quality using four-directional Gaussian quality modeling, which leads the consistent results between two parallel branches. Extensive experiments on the MS COCO benchmark show that the proposed method consistently outperforms the baseline (FCOS). Eventually, our best model offers the state-of-the-art performance by achieving 48.9% in AP. We also confirm the efficiency of the method by comparing the number of parameters and computational overhead.

Overall Architecture

Implementation Details

We implement our detection model on top of MMDetection (v2.6), an open source object detection toolbox. If not specified separately, the default settings of FCOS implementation are not changed. We train and validate our network on four RTX TITAN GPUs in the environment of Pytorch v1.6 and CUDA v10.2.

Please see GETTING_STARTED.md for the basic usage of MMDetection.

Installation

Clone the this repository.

git clone https://github.com/POSTECH-IMLAB/LQM.git
cd LQM

Create a conda virtural environment and install dependencies.
```
conda env create -f environment.yml
```
Activate conda environment
```
conda activate lqm
```

Install build requirements and then install MMDetection.

pip install --force-reinstall mmcv-full==1.1.5 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.6.0/index.html
pip install -v -e .

Preparing MS COCO dataset

bash download_coco.sh

Preparing Pre-trained model weights

bash download_weights.sh

Train

# assume that you are under the root directory of this project,
# and you have activated your virtual environment if needed.
# and with COCO dataset in 'data/coco/'

./tools/dist_train.sh configs/uncertainty_guide/uncertainty_guide_r50_fpn_1x.py 4 --validate

Inference

./tools/dist_test.sh configs/uncertainty_guide/uncertainty_guide_r50_fpn_1x.py work_dirs/uncertainty_guide_r50_fpn_1x/epoch_12.pth 4 --eval bbox

Image demo using pretrained model weight

# Result will be saved under the demo directory of this project (detection_result.jpg)
# config, checkpoint, source image path are needed (If you need pre-trained weights, you can download them from provided google drive link)
# score threshold is optional

python demo/LQM_image_demo.py --config configs/uncertainty_guide/uncertainty_guide_r50_fpn_1x.py --checkpoint work_dirs/pretrained/LQM_r50_fpn_1x.pth --img data/coco/test2017/000000011245.jpg --score-thr 0.3

Webcam demo using pretrained model weight

# config, checkpoint path are needed (If you need pre-trained weights, you can download them from provided google drive link)
# score threshold is optional

python demo/webcam_demo.py configs/uncertainty_guide/uncertainty_guide_r50_fpn_1x.py work_dirs/pretrained/LQM_r50_fpn_1x.pth

Models

For your convenience, we provide the following trained models. All models are trained with 16 images in a mini-batch with 4 GPUs.

Model	Multi-scale training	AP (minival)	Link
LQM_R50_FPN_1x	No	40.0	Google
LQM_R101_FPN_2x	Yes	44.8	Google
LQM_R101_dcnv2_FPN_2x	Yes	47.4	Google
LQM_X101_FPN_2x	Yes	47.2	Google
LQM_X101_dcnv2_FPN_2x	Yes	48.9	Google

LQM - Improving Object Detection by Estimating Bounding Box Quality Accurately

Related tags

Overview

Improving Object Detection by Estimating Bounding Box Quality Accurately

Abstract

Overall Architecture

Implementation Details

Installation

Preparing MS COCO dataset

Preparing Pre-trained model weights

Train

Inference

Image demo using pretrained model weight

Webcam demo using pretrained model weight

Models

Owner

IM Lab., POSTECH

这是一个利用facenet和retinaface实现人脸识别的库，可以进行在线的人脸识别。

Simple and Distributed Machine Learning

Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild

Adjusting for Autocorrelated Errors in Neural Networks for Time Series

Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.

A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding his way.

DI-smartcross - Decision Intelligence Platform for Traffic Crossing Signal Control

Single Red Blood Cell Hydrodynamic Traps Via the Generative Design

Callable PyTrees and filtered JIT/grad transformations => neural networks in JAX.

TLoL (Python Module) - League of Legends Deep Learning AI (Research and Development)

Toontown House CT Edition

A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

Location-Sensitive Visual Recognition with Cross-IOU Loss

🛠️ SLAMcore SLAM Utilities

Prototype for Baby Action Detection and Classification

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Kalidokit is a blendshape and kinematics solver for Mediapipe/Tensorflow.js face, eyes, pose, and hand tracking models

An example of time series augmentation methods with Keras

LQM - Improving Object Detection by Estimating Bounding Box Quality Accurately

Related tags

Overview

Improving Object Detection by Estimating Bounding Box Quality Accurately

Abstract

Overall Architecture

Implementation Details

Installation

Preparing MS COCO dataset

Preparing Pre-trained model weights

Train

Inference

Image demo using pretrained model weight

Webcam demo using pretrained model weight

Models

Owner

IM Lab., POSTECH

这是一个利用facenet和retinaface实现人脸识别的库，可以进行在线的人脸识别。

Simple and Distributed Machine Learning

Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild

Adjusting for Autocorrelated Errors in Neural Networks for Time Series

Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding his way.

DI-smartcross - Decision Intelligence Platform for Traffic Crossing Signal Control

Single Red Blood Cell Hydrodynamic Traps Via the Generative Design

Callable PyTrees and filtered JIT/grad transformations => neural networks in JAX.

TLoL (Python Module) - League of Legends Deep Learning AI (Research and Development)

Toontown House CT Edition

A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

Location-Sensitive Visual Recognition with Cross-IOU Loss

🛠️ SLAMcore SLAM Utilities

Prototype for Baby Action Detection and Classification

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Kalidokit is a blendshape and kinematics solver for Mediapipe/Tensorflow.js face, eyes, pose, and hand tracking models

An example of time series augmentation methods with Keras

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.