TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

Last update: Dec 21, 2022

Overview

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Paper Links: TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation (CVPR 2022)

by Wenqiang Zhang*, Zilong Huang*, Guozhong Luo, Tao Chen, Xinggang Wang†, Wenyu Liu†, Gang Yu, Chunhua Shen.

(*) equal contribution, (†) corresponding author.

Introduction

Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost makes it not suitable to deal with dense prediction tasks such as semantic segmentation on mobile devices. In this paper, we present a mobile-friendly architecture named Token Pyramid Vision TransFormer(TopFormer). The proposed TopFormer takes Tokens from various scales as input to produce scale-aware semantic features, which are then injected into the corresponding tokens to augment the representation. Experimental results demonstrate that our method significantly outperforms CNN- and ViT-based networks across several semantic segmentation datasets and achieves a good trade-off between accuracy and latency.

The latency is measured on a single Qualcomm Snapdragon 865 with input size 512×512×3, only an ARM CPU core is used for speed testing. *indicates the input size is 448×448×3.

Updates

04/23/2022: TopFormer backbone has been integrated into PaddleViT, checkout here for the 3rd party implementation on Paddle framework!

Requirements

pytorch 1.5+
mmcv-full==1.3.14

Main results

The classification models pretrained on ImageNet can be downloaded from Baidu Drive/Google Drive.

ADE20K

Model	Params(M)	FLOPs(G)	mIoU(ss)	Link
TopFormer-T_448x448_2x8_160k	1.4	0.5	32.5	Baidu Drive, Google Drive
TopFormer-T_448x448_4x8_160k	1.4	0.5	33.4	Baidu Drive, Google Drive
TopFormer-T_512x512_2x8_160k	1.4	0.6	33.6	Baidu Drive, Google Drive
TopFormer-T_512x512_4x8_160k	1.4	0.6	34.6	Baidu Drive, Google Drive
TopFormer-S_512x512_2x8_160k	3.1	1.2	36.5	Baidu Drive, Google Drive
TopFormer-S_512x512_4x8_160k	3.1	1.2	37.0	Baidu Drive, Google Drive
TopFormer-B_512x512_2x8_160k	5.1	1.8	38.3	Baidu Drive, Google Drive
TopFormer-B_512x512_4x8_160k	5.1	1.8	39.2	Baidu Drive, Google Drive

ss indicates single-scale.
The password of Baidu Drive is topf

Usage

Please see MMSegmentation for dataset prepare.

For training, run:

sh tools/dist_train.sh local_configs/topformer/<config-file> <num-of-gpus-to-use> --work-dir /path/to/save/checkpoint

To evaluate, run:

sh tools/dist_test.sh local_configs/topformer/<config-file> <checkpoint-path> <num-of-gpus-to-use>

To test the inference speed in mobile device, please refer to tnn_runtime.

Acknowledgement

The implementation is based on MMSegmentation.

Citation

if you find our work helpful to your experiments, please cite with:

@article{zhang2022topformer,
  title     = {TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation},
  author    = {Zhang, Wenqiang and Huang, Zilong and Luo, Guozhong and Chen, Tao and Wang,  Xinggang and Liu, Wenyu and Yu, Gang and Shen, Chunhua.},
  booktitle = {Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},
  year      = {2022}
}

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation, CVPR2022

Related tags

Overview

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Introduction

Updates

Requirements

Main results

Usage

Acknowledgement

Citation

Owner

Hust Visual Learning Team

PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)

Official re-implementation of the Calibrated Adversarial Refinement model described in the paper Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

3D Pose Estimation for Vehicles

Mememoji - A facial expression classification system that recognizes 6 basic emotions: happy, sad, surprise, fear, anger and neutral.

PyJokes - Joking around with Python library pyjokes

This project intends to use SVM supervised learning to determine whether or not an individual is diabetic given certain attributes.

Code for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space"

Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

Learning from Synthetic Humans, CVPR 2017

Official code for the CVPR 2021 paper "How Well Do Self-Supervised Models Transfer?"

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

PyTorch implementation of VAGAN: Visual Feature Attribution Using Wasserstein GANs

A curated list of neural rendering resources.

The implementation for "Comprehensive Knowledge Distillation with Causal Intervention".

Extending JAX with custom C++ and CUDA code

Malware Env for OpenAI Gym