Location-Sensitive Visual Recognition with Cross-IOU Loss

Overview

The trained models are temporarily unavailable, but you can train the code using reasonable computational resource.

Location-Sensitive Visual Recognition with Cross-IOU Loss

by Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang and Qi Tian

The code to train and evaluate the proposed LSNet is available here. For more technical details, please refer to our arXiv paper.

The location-sensitive visual recognition tasks, including object detection, instance segmentation, and human pose estimation, can be formulated into localizing an anchor point (in red) and a set of landmarks (in green). Our work aims to offer a unified framework for these tasks.

Abstract

Object detection, instance segmentation, and pose estimation are popular visual recognition tasks which require localizing the object by internal or boundary landmarks. This paper summarizes these tasks as location-sensitive visual recognition and proposes a unified solution named location-sensitive network (LSNet). Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object. The key to optimizing the LSNet lies in the ability of fitting various scales, for which we design a novel loss function named cross-IOU loss that computes the cross-IOU of each anchor-landmark pair to approximate the global IOU between the prediction and groundtruth. The flexibly located and accurately predicted landmarks also enable LSNet to incorporate richer contextual information for visual recognition. Evaluated on the MSCOCO dataset, LSNet set the new state-of-the-art accuracy for anchor-free object detection (a 53.5% box AP) and instance segmentation (a 40.2% mask AP), and shows promising performance in detecting multi-scale human poses.

If you encounter any problems in using our code, please contact Kaiwen Duan: [email protected]

Bbox AP(%) on COCO test-dev

Method Backbone epoch MStrain AP AP50 AP75 APS APM APL
Anchor-based:
Libra R-CNN X-101-64x4d 12 N 43.0 64.0 47.0 25.3 45.6 54.6
AB+FSAF* X-101-64x4d 18 Y 44.6 65.2 48.6 29.7 47.1 54.6
FreeAnchor* X-101-32x8d 24 Y 47.3 66.3 51.5 30.6 50.4 59.0
GFLV1* X-101-32x8d 24 Y 48.2 67.4 52.6 29.2 51.7 60.2
ATSS* X-101-64x4d-DCN 24 Y 50.7 68.9 56.3 33.2 52.9 62.4
PAA* X-101-64x4d-DCN 24 Y 51.4 69.7 57.0 34.0 53.8 64.0
GFLV2* R2-101-DCN 24 Y 53.3 70.9 59.2 35.7 56.1 65.6
YOLOv4-P7* CSP-P7 450 Y 56.0 73.3 61.2 38.9 60.0 68.6
Anchor-free:
ExtremeNet* HG-104 200 Y 43.2 59.8 46.4 24.1 46.0 57.1
RepPointsV1* R-101-DCN 24 Y 46.5 67.4 50.9 30.3 49.7 57.1
SAPD X-101-64x4d-DCN 24 Y 47.4 67.4 51.1 28.1 50.3 61.5
CornerNet* HG-104 200 Y 42.1 57.8 45.3 20.8 44.8 56.7
DETR R-101 500 Y 44.9 64.7 47.7 23.7 49.5 62.3
CenterNet* HG-104 190 Y 47.0 64.5 50.7 28.9 49.9 58.9
CPNDet* HG-104 100 Y 49.2 67.4 53.7 31.0 51.9 62.4
BorderDet* X-101-64x4d-DCN 24 Y 50.3 68.9 55.2 32.8 52.8 62.3
FCOS-BiFPN X-101-32x8-DCN 24 Y 50.4 68.9 55.0 33.2 53.0 62.7
RepPointsV2* X-101-64x4d-DCN 24 Y 52.1 70.1 57.5 34.5 54.6 63.6
LSNet R-50 24 Y 44.8 64.1 48.8 26.6 47.7 55.7
LSNet X-101-64x4d 24 Y 48.2 67.6 52.6 29.6 51.3 60.5
LSNet X-101-64x4d-DCN 24 Y 49.6 69.0 54.1 30.3 52.8 62.8
LSNet-CPV X-101-64x4d-DCN 24 Y 50.4 69.4 54.5 31.0 53.3 64.0
LSNet-CPV R2-101-DCN 24 Y 51.1 70.3 55.2 31.2 54.3 65.0
LSNet-CPV* R2-101-DCN 24 Y 53.5 71.1 59.2 35.2 56.4 65.8

A comparison between LSNet and the sate-of-the-art methods in object detection on the MS-COCO test-dev set. LSNet surpasses all competitors in the anchor-free group. The abbreviations are: ‘R’ – ResNet, ‘X’ – ResNeXt, ‘HG’ – Hourglass network, ‘R2’ – Res2Net, ‘CPV’ – corner point verification, ‘MStrain’ – multi-scale training, * – multi-scale testing.

Segm AP(%) on COCO test-dev

Method Backbone epoch AP AP50 AP75 APS APM APL
Pixel-based:
YOLACT R-101 48 31.2 50.6 32.8 12.1 33.3 47.1
TensorMask R-101 72 37.1 59.3 39.4 17.1 39.1 51.6
Mask R-CNN X-101-32x4d 12 37.1 60.0 39.4 16.9 39.9 53.5
HTC X-101-64x4d 20 41.2 63.9 44.7 22.8 43.9 54.6
DetectoRS* X-101-64x4d 40 48.5 72.0 53.3 31.6 50.9 61.5
Contour-based:
ExtremeNet HG-104 100 18.9 44.5 13.7 10.4 20.4 28.3
DeepSnake DLA-34 120 30.3 - - - - -
PolarMask X-101-64x4d-DCN 24 36.2 59.4 37.7 17.8 37.7 51.5
LSNet X-101-64x4d-DCN 30 37.6 64.0 38.3 22.1 39.9 49.1
LSNet R2-101-DCN 30 38.0 64.6 39.0 22.4 40.6 49.2
LSNet* X-101-64x4d-DCN 30 39.7 65.5 41.3 25.5 41.3 50.4
LSNet* R2-101-DCN 30 40.2 66.2 42.1 25.8 42.2 51.0

Comparison of LSNet to the sate-of-the-art methods in instance segmentation task on the COCO test-dev set. Our LSNet achieves the state-of-the-art accuracy for contour-based instance segmentation. ‘R’ - ResNet, ‘X’ - ResNeXt, ‘HG’ - Hourglass, ‘R2’ - Res2Net, * - multi-scale testing.

Keypoints AP(%) on COCO test-dev

Method Backbone epoch AP AP50 AP75 APM APL
Heatmap-based:
CenterNet-jd DLA-34 320 57.9 84.7 63.1 52.5 67.4
OpenPose VGG-19 - 61.8 84.9 67.5 58.0 70.4
Pose-AE HG 300 62.8 84.6 69.2 57.5 70.6
CenterNet-jd HG104 150 63.0 86.8 69.6 58.9 70.4
Mask R-CNN R-50 28 63.1 87.3 68.7 57.8 71.4
PersonLab R-152 >1000 66.5 85.5 71.3 62.3 70.0
HRNet HRNet-W32 210 74.9 92.5 82.8 71.3 80.9
Regression-based:
CenterNet-reg [66] DLA-34 320 51.7 81.4 55.2 44.6 63.0
CenterNet-reg [66] HG-104 150 55.0 83.5 59.7 49.4 64.0
LSNet w/ obj-box X-101-64x4d-DCN 60 55.7 81.3 61.0 52.9 60.5
LSNet w/ kps-box X-101-64x4d-DCN 20 59.0 83.6 65.2 53.3 67.9

Comparison of LSNet to the sate-of-the-art methods in pose estimation task on the COCO test-dev set. LSNet predict the keypoints by regression. ‘obj-box’ and ‘kps-box’ denote the object bounding boxes and the keypoint-boxes, respectively. For LSNet w/ kps-box, we fine-tune the model from the LSNet w/ kps-box for another 20 epochs.

Visualization

Some location-sensitive visual recognition results on the MS-COCO validation set.

We compared with the CenterNet to show that our LSNet w/ ‘obj-box’ tends to predict more human pose of small scales, which are not annotated on the dataset. Only pose results with scores higher than 0:3 are shown for both methods.

Left: LSNet uses the object bounding boxes to assign training samples. Right: LSNet uses the keypoint-boxes to assign training samples. Although LSNet with keypoint-boxes enjoys higher AP score, its ability of perceiving multi-scale human instances is weakened.

Preparation

The master branch works with PyTorch 1.5.0

The dataset directory should be like this:

├── data
│   ├── coco
│   │   ├── annotations
│   │   ├── images
            ├── train2017
            ├── val2017
            ├── test2017

Generate extreme point annotation from segmentation:

  • cd code/tools
  • python gen_coco_lsvr.py
  • cd ..

Installation

1. Installing cocoapi
  • cd cocoapi/pycocotools
  • python setup.py develop
  • cd ../..
2. Installing mmcv
  • cd mmcv
  • pip install -e.
  • cd ..
3. Installing mmdet
  • python setup.py develop

Training and Evaluation

Our LSNet is based on mmdetection. Please check with existing dataset for Training and Evaluation.

Owner
Kaiwen Duan
Kaiwen Duan
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp

Ryan Spring 114 Nov 04, 2022
noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.

ProSelfLC: CVPR 2021 ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks For any specific discussion or potential fu

amos_xwang 57 Dec 04, 2022
Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation

Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation Paper Multi-Target Adversarial Frameworks for Domain Adaptation in

Valeo.ai 20 Jun 21, 2022
[AAAI 2022] Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

A paper Introduction This is an official release of the paper Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation wit

Jiacheng Wang 14 Dec 08, 2022
QR2Pass-project - A proof of concept for an alternative (passwordless) authentication system to a web server

QR2Pass This is a proof of concept for an alternative (passwordless) authenticat

4 Dec 09, 2022
[KDD 2021, Research Track] DiffMG: Differentiable Meta Graph Search for Heterogeneous Graph Neural Networks

DiffMG This repository contains the code for our KDD 2021 Research Track paper: DiffMG: Differentiable Meta Graph Search for Heterogeneous Graph Neura

AutoML Research 24 Nov 29, 2022
Fast, Attemptable Route Planner for Navigation in Known and Unknown Environments

FAR Planner uses a dynamically updated visibility graph for fast replanning. The planner models the environment with polygons and builds a global visi

Fan Yang 346 Dec 30, 2022
GANTheftAuto is a fork of the Nvidia's GameGAN

Description GANTheftAuto is a fork of the Nvidia's GameGAN, which is research focused on emulating dynamic game environments. The early research done

Harrison 801 Dec 27, 2022
Turning SymPy expressions into PyTorch modules.

sympytorch A micro-library as a convenience for turning SymPy expressions into PyTorch Modules. All SymPy floats become trainable parameters. All SymP

Patrick Kidger 89 Dec 13, 2022
GBIM(Gesture-Based Interaction map)

手势交互地图 GBIM(Gesture-Based Interaction map),基于视觉深度神经网络的交互地图,通过电脑摄像头观察使用者的手势变化,进而控制地图进行简单的交互。网络使用PaddleX提供的轻量级模型PPYOLO Tiny以及MobileNet V3 small,使得整个模型大小约10MB左右,即使在CPU下也能快速定位和识别手势。

8 Feb 10, 2022
Genetic Programming in Python, with a scikit-learn inspired API

Welcome to gplearn! gplearn implements Genetic Programming in Python, with a scikit-learn inspired and compatible API. While Genetic Programming (GP)

Trevor Stephens 1.3k Jan 03, 2023
Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"

Status: Archive (code is provided as-is, no updates expected) InfoGAN Code for reproducing key results in the paper InfoGAN: Interpretable Representat

OpenAI 1k Dec 19, 2022
Official implementation of VQ-Diffusion

Official implementation of VQ-Diffusion: Vector Quantized Diffusion Model for Text-to-Image Synthesis

Microsoft 592 Jan 03, 2023
Deduplicating Training Data Makes Language Models Better

Deduplicating Training Data Makes Language Models Better This repository contains code to deduplicate language model datasets as descrbed in the paper

Google Research 431 Dec 27, 2022
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Ubisoft 76 Dec 30, 2022
A package for "Procedural Content Generation via Reinforcement Learning" OpenAI Gym interface.

Readme: Illuminating Diverse Neural Cellular Automata for Level Generation This is the codebase used to generate the results presented in the paper av

Sam Earle 27 Jan 05, 2023
Exploration of some patients clinical variables.

Answer_ALS_clinical_data Exploration of some patients clinical variables. All the clinical / metadata data is available here: https://data.answerals.o

1 Jan 20, 2022
Brain Tumor Detection with Tensorflow Neural Networks.

Brain-Tumor-Detection A convolutional neural network model built with Tensorflow & Keras to detect brain tumor and its different variants. Data of the

404ErrorNotFound 5 Aug 23, 2022
PyTorch Implementation of Small Lesion Segmentation in Brain MRIs with Subpixel Embedding (ORAL, MICCAIW 2021)

Small Lesion Segmentation in Brain MRIs with Subpixel Embedding PyTorch implementation of Small Lesion Segmentation in Brain MRIs with Subpixel Embedd

22 Oct 21, 2022
Resources complimenting the Machine Learning Course led in the Faculty of mathematics and informatics part of Sofia University.

Machine Learning and Data Mining, Summer 2021-2022 How to learn data science and machine learning? Programming. Learn Python. Basic Statistics. Take a

Simeon Hristov 8 Oct 04, 2022