[ICCV-2021] An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation

Overview

An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation (ICCV 2021)

Introduction

This is an official pytorch implementation of An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation. [ICCV 2021] PDF

Abstract

Most semi-supervised learning models are consistency-based, which leverage unlabeled images by maximizing the similarity between different augmentations of an image. But when we apply them to human pose estimation that has extremely imbalanced class distribution, they often collapse and predict every pixel in unlabeled images as background. We find this is because the decision boundary passes the high-density areas of the minor class so more and more pixels are gradually mis-classified as background.

In this work, we present a surprisingly simple approach to drive the model. For each image, it composes a pair of easy-hard augmentations and uses the more accurate predictions on the easy image to teach the network to learn pose information of the hard one. The accuracy superiority of teaching signals allows the network to be “monotonically” improved which effectively avoids collapsing. We apply our method to the state-of-the-art pose estimators and it further improves their performance on three public datasets.

Main Results

1. Semi-Supervised Setting

Results on COCO Val2017

Method Augmentation 1K Labels 5K Labels 10K Labels
Supervised Affine 31.5 46.4 51.1
PoseCons (Single) Affine 38.5 50.5 55.4
PoseCons (Single) Affine + Joint Cutout 42.1 52.3 57.3
PoseDual (Dual) Affine 41.5 54.8 58.7
PoseDual (Dual) Affine + RandAug 43.7 55.4 59.3
PoseDual (Dual) Affine + Joint Cutout 44.6 55.6 59.6

We use COCO Subset (1K, 5K and 10K) and TRAIN as labeled and unlabeled datasets, respectively

Note:

  • The Ground Truth person boxes is used
  • No flipping test is used.

2. Full labels Setting

Results on COCO Val2017

Method Network AP AP.5 AR
Supervised ResNet50 70.9 91.4 74.2
PoseDual ResNet50 73.9 (↑3.0) 92.5 77.0
Supervised HRNetW48 77.2 93.5 79.9
PoseDual HRNetW48 79.2 (↑2.0) 94.6 81.7

We use COCO TRAIN and WILD as labeled and unlabeled datasets, respectively

Pretrained Models

Download Links Google Drive

Environment

The code is developed using python 3.7 on Ubuntu 16.04. NVIDIA GPUs are needed.

Quick start

Installation

  1. Install pytorch >= v1.2.0 following official instruction.

  2. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.

  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Make libs:

    cd ${POSE_ROOT}/lib
    make
    
  5. Init output(training model output directory)::

     mkdir output 
     mkdir log
    
  6. Download pytorch imagenet pretrained models from Google Drive. The PoseDual (ResNet18) should load resnet18_5c_gluon_posedual as pretrained for training,

  7. Download our pretrained models from Google Drive

    ${POSE_ROOT}
     `-- models
         `-- pytorch
             |-- imagenet
             |   |-- resnet18_5c_f3_posedual.pth
             |   |-- resnet18-5c106cde.pth
             |   |-- resnet50-19c8e357.pth
             |   |-- resnet101-5d3b4d8f.pth
             |   |-- resnet152-b121ed2d.pth
             |   |-- ......
             |-- pose_dual
                 |-- COCO_subset
                 |   |-- COCO1K_PoseDual.pth.tar
                 |   |-- COCO5K_PoseDual.pth.tar
                 |   |-- COCO10K_PoseDual.pth.tar
                 |   |-- ......
                 |-- COCO_COCOwild
                 |-- ......
    

Data preparation

For COCO and MPII dataset, Please refer to Simple Baseline to prepare them.
Download Person Detection Boxes and Images for COCO WILD (unlabeled) set. The structure looks like this:

${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   |-- person_keypoints_val2017.json
        |   `__ image_info_unlabeled2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        |   |-- COCO_test-dev2017_detections_AP_H_609_person.json
        |   `-- COCO_unlabeled2017_detections_person_faster_rcnn.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- ... 

For AIC data, please download from AI Challenger 2017, 2017 Train/Val is needed for keypoints training and validation. Please download the annotation files from AIC Annotations. The structure looks like this:

${POSE_ROOT}
|-- data
`-- |-- ai_challenger
    `-- |-- train
        |   |-- images
        |   `-- keypoint_train_annotation.json
        `-- validation
            |-- images
            |   |-- 0a00c0b5493774b3de2cf439c84702dd839af9a2.jpg
            |   |-- 0a0c466577b9d87e0a0ed84fc8f95ccc1197f4b0.jpg
            |   `-- ...
            |-- gt_valid.mat
            `-- keypoint_validation_annotation.json

Run

Training

1. Training Dual Networks (PoseDual) on COCO 1K labels

python pose_estimation/train.py \
    --cfg experiments/mix_coco_coco/res18/256x192_COCO1K_PoseDual.yaml

2. Training Dual Networks on COCO 1K labels with Joint Cutout

python pose_estimation/train.py \
    --cfg experiments/mix_coco_coco/res18/256x192_COCO1K_PoseDual_JointCutout.yaml

3.Training Dual Networks on COCO 1K labels with Distributed Data Parallel

python -m torch.distributed.launch --nproc_per_node=4  pose_estimation/train.py \
    --distributed --cfg experiments/mix_coco_coco/res18/256x192_COCO1K_PoseDual.yaml

4. Training Single Networks (PoseCons) on COCO 1K labels

python pose_estimation/train.py \
    --cfg experiments/mix_coco_coco/res18/256x192_COCO1K_PoseCons.yaml

5. Training Dual Networks (PoseDual) with ResNet50 on COCO TRAIN + WILD

python pose_estimation/train.py \
    --cfg experiments/mix_coco_coco/res50/256x192_COCO_COCOunlabel_PoseDual_JointCut.yaml

Testing

6. Testing Dual Networks (PoseDual+COCO1K) on COCO VAL

python pose_estimation/valid.py \
    --cfg experiments/mix_coco_coco/res18/256x192_COCO1K_PoseDual.yaml

Citation

If you use our code or models in your research, please cite with:

@inproceedings{semipose,
  title={An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human Pose Estimation},
  author={Xie, Rongchang and Wang, Chunyu and Zeng, Wenjun and Wang, Yizhou},
  booktitle={ICCV},
  year={2021}
}

Acknowledgement

The code is mainly based on Simple Baseline and HRNet. Some code comes from DarkPose. Thanks for their works.

Owner
rongchangxie
Graduate student of Peking university
rongchangxie
Codes for paper "KNAS: Green Neural Architecture Search"

KNAS Codes for paper "KNAS: Green Neural Architecture Search" KNAS is a green (energy-efficient) Neural Architecture Search (NAS) approach. It contain

90 Dec 22, 2022
Repository for Multimodal AutoML Benchmark

Benchmarking Multimodal AutoML for Tabular Data with Text Fields Repository for the NeurIPS 2021 Dataset Track Submission "Benchmarking Multimodal Aut

Xingjian Shi 44 Nov 24, 2022
Code from Daniel Lemire, A Better Alternative to Piecewise Linear Time Series Segmentation

PiecewiseLinearTimeSeriesApproximation code from Daniel Lemire, A Better Alternative to Piecewise Linear Time Series Segmentation, SIAM Data Mining 20

Daniel Lemire 21 Oct 27, 2022
Deep Learning Pipelines for Apache Spark

Deep Learning Pipelines for Apache Spark The repo only contains HorovodRunner code for local CI and API docs. To use HorovodRunner for distributed tra

Databricks 2k Jan 08, 2023
WormMovementSimulation - 3D Simulation of Worm Body Movement with Neurons attached to its body

Generate 3D Locomotion Data This module is intended to create 2D video trajector

1 Aug 09, 2022
Open-AI's DALL-E for large scale training in mesh-tensorflow.

DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode

EleutherAI 432 Dec 16, 2022
Spectrum Surveying: Active Radio Map Estimation with Autonomous UAVs

Spectrum Surveying: The Python code in this repository implements the simulations and plots the figures described in the paper “Spectrum Surveying: Ac

Universitetet i Agder 2 Dec 06, 2022
The final project for "Applying AI to Wearable Device Data" course from "AI for Healthcare" - Udacity.

Motion Compensated Pulse Rate Estimation Overview This project has 2 main parts. Develop a Pulse Rate Algorithm on the given training data. Then Test

Omar Laham 2 Oct 25, 2022
Resco: A simple python package that report the effect of deep residual learning

resco Description resco is a simple python package that report the effect of dee

Pierre-Arthur Claudé 1 Jun 28, 2022
"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

FGVC8 Exploring Vision Transformers for Fine-grained Classification paper presented at the CVPR 2021, The Eight Workshop on Fine-Grained Visual Catego

Marcos V. Conde 19 Dec 06, 2022
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Decision Transformer Lili Chen*, Kevin Lu*, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas†, and Igor M

Kevin Lu 1.4k Jan 07, 2023
My solution for the 7th place / 245 in the Umoja Hack 2022 challenge

Umoja Hack 2022 : Insurance Claim Challenge My solution for the 7th place / 245 in the Umoja Hack 2022 challenge Umoja Hack Africa is a yearly hackath

Souames Annis 17 Jun 03, 2022
Bayesian regularization for functional graphical models.

BayesFGM Paper: Jiajing Niu, Andrew Brown. Bayesian regularization for functional graphical models. Requirements R version 3.6.3 and up Python 3.6 and

0 Oct 07, 2021
Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

ViLT Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" Install pip install -r requirements.txt pip

Wonjae Kim 922 Jan 01, 2023
The devkit of the nuPlan dataset.

The devkit of the nuPlan dataset.

Motional 264 Jan 03, 2023
Monk is a low code Deep Learning tool and a unified wrapper for Computer Vision.

Monk - A computer vision toolkit for everyone Why use Monk Issue: Want to begin learning computer vision Solution: Start with Monk's hands-on study ro

Tessellate Imaging 507 Dec 04, 2022
FlingBot: The Unreasonable Effectiveness of Dynamic Manipulations for Cloth Unfolding

This repository contains code for training and evaluating FlingBot in both simulation and real-world settings on a dual-UR5 robot arm setup for Ubuntu 18.04

Columbia Artificial Intelligence and Robotics Lab 70 Dec 06, 2022
Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Renato Almeida de Oliveira 18 Aug 31, 2022
PyTorch implementation of Memory-based semantic segmentation for off-road unstructured natural environments.

MemSeg: Memory-based semantic segmentation for off-road unstructured natural environments Introduction This repository is a PyTorch implementation of

11 Nov 28, 2022
Semi-automated OpenVINO benchmark_app with variable parameters

Semi-automated OpenVINO benchmark_app with variable parameters. User can specify multiple options for any parameters in the benchmark_app and the progam runs the benchmark with all combinations of gi

Yasunori Shimura 8 Apr 11, 2022