AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Last update: Dec 26, 2022

Related tags

Deep Learning AdaFocusV2

Overview

AdaFocusV2

This repo contains the official code and pre-trained models for AdaFocusV2.

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Introduction

Recent works have shown that the computational efficiency of video recognition can be significantly improved by reducing the spatial redundancy. As a representative work, the adaptive focus method (AdaFocus) has achieved a favorable trade-off between accuracy and inference speed by dynamically identifying and attending to the informative regions in each video frame. However, AdaFocus requires a complicated three-stage training pipeline (involving reinforcement learning), leading to slow convergence and is unfriendly to practitioners. This work reformulates the training of AdaFocus as a simple one-stage algorithm by introducing a differentiable interpolation-based patch selection operation, enabling efficient end-to-end optimization. We further present an improved training scheme to address the issues introduced by the one-stage formulation, including the lack of supervision, input diversity and training stability. Moreover, a conditional-exit technique is proposed to perform temporal adaptive computation on top of AdaFocus without additional training. Extensive experiments on six benchmark datasets (i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2, and Jester) demonstrate that our model significantly outperforms the original AdaFocus and other competitive baselines, while being considerably more simple and efficient to train.

Results

Compared with AdaFocusV1

ActivityNet, FCVID and Mini-Kinetics

Something-Something V1&V2 and Jester

Visualization

Get Started

Please go to the folder Experiments on ActivityNet, FCVID and Mini-Kinetics and Experiments on Sth-Sth and Jester for specific docs.

Contact

If you have any question, feel free to contact the authors or raise an issue. Yulin Wang: [email protected].

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Related tags

Overview

AdaFocusV2

Introduction

Results

Get Started

Contact

Owner

DeepLab is a state-of-art deep learning system for semantic image segmentation built on top of Caffe.

李云龙二次元风格化!打滚卖萌，使用了animeGANv2进行了视频的风格迁移

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

A Simple and Versatile Framework for Object Detection and Instance Recognition

This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks

Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System

Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation

Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

Informal Persian Universal Dependency Treebank

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph

Real-time 3D multi-person detection made easy with OpenPose and the ZED

Encoding Causal Macrovariables

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Offline Reinforcement Learning with Implicit Q-Learning

Official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR)

Awesome-google-colab - Google Colaboratory Notebooks and Repositories

PyGRANSO: A PyTorch-enabled port of GRANSO with auto-differentiation

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

EgoNN: Egocentric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale