[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Last update: Jan 05, 2023

Related tags

Overview

SEgmentation TRansformers -- SETR

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers,
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, Li Zhang,
CVPR 2021

Installation

Our project is developed based on mmsegmentation. Please follow the official mmsegmentation INSTALL.md and getting_started.md for installation and dataset preparation.

Main results

Cityscapes

Method	Crop Size	Batch size	iteration	set	mIoU
SETR-Naive	768x768	8	40k	val	77.37	model config
SETR-Naive	768x768	8	80k	val	77.90	model config
SETR-MLA	768x768	8	40k	val	76.65	model config
SETR-MLA	768x768	8	80k	val	77.24	model config
SETR-PUP	768x768	8	40k	val	78.39	model config
SETR-PUP	768x768	8	80k	val	79.34	model config
SETR-Naive-DeiT	768x768	8	40k	val	77.85	model config
SETR-Naive-DeiT	768x768	8	80k	val	78.66	model config
SETR-MLA-DeiT	768x768	8	40k	val	78.04	model config
SETR-MLA-DeiT	768x768	8	80k	val	78.98	model config
SETR-PUP-DeiT	768x768	8	40k	val	78.79	model config
SETR-PUP-DeiT	768x768	8	80k	val	79.45	model config

ADE20K

Method	Crop Size	Batch size	iteration	set	mIoU	mIoU(ms+flip)
SETR-Naive	512x512	16	160k	Val	48.06	48.80	model config
SETR-MLA	512x512	8	160k	val	48.27	50.03	model config
SETR-MLA	512x512	16	160k	val	48.64	50.28	model config
SETR-PUP	512x512	16	160k	val	48.58	50.09	model config

Pascal Context

Method	Crop Size	Batch size	iteration	set	mIoU	mIoU(ms+flip)
SETR-Naive	480x480	16	80k	val	52.89	53.61	model config
SETR-MLA	480x480	8	80k	val	54.39	55.39	model config
SETR-MLA	480x480	16	80k	val	54.87	55.83	model config
SETR-PUP	480x480	16	80k	val	54.40	55.27	model config

Get Started

Train

./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} 
# For example, train a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_train.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py 8

Single-scale testing

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Multi-scale testing

Use the config file ending in _MS.py in configs/SETR.

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8_MS.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Please see getting_started.md for the more basic usage of training and testing.

Reference

@inproceedings{SETR,
    title={Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers}, 
    author={Zheng, Sixiao and Lu, Jiachen and Zhao, Hengshuang and Zhu, Xiatian and Luo, Zekun and Wang, Yabiao and Fu, Yanwei and Feng, Jianfeng and Xiang, Tao and Torr, Philip H.S. and Zhang, Li},
    booktitle={CVPR},
    year={2021}
}

License

MIT

Acknowledgement

Thanks to previous open-sourced repo:
mmsegmentation
pytorch-image-models

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Related tags

Overview

SEgmentation TRansformers -- SETR

Installation

Main results

Cityscapes

ADE20K

Pascal Context

Get Started

Train

Single-scale testing

Multi-scale testing

Reference

License

Acknowledgement

Owner

Fudan Zhang Vision Group

Clean Machine Learning, a Coding Kata

The repository forked from NVlabs uses our data. (Differentiable rasterization applied to 3D model simplification tasks)

Deep Learning for 3D Point Clouds: A Survey (IEEE TPAMI, 2020)

CVPR2020 Counterfactual Samples Synthesizing for Robust VQA

Prediction of MBA refinance Index (Mortgage prepayment)

Charsiu: A transformer-based phonetic aligner

PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

K-Means Clustering and Hierarchical Clustering Unsupervised Learning Solution in Python3.

Gesture Volume Control Using OpenCV and MediaPipe

Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.

Codes for CyGen, the novel generative modeling framework proposed in "On the Generative Utility of Cyclic Conditionals" (NeurIPS-21)

TensorFlow2 Classification Model Zoo playing with TensorFlow2 on the CIFAR-10 dataset.

Tutorial materials for Part of NSU Intro to Deep Learning with PyTorch.

Pytorch Geometric Tutorials

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers

Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

[ICLR 2021] Is Attention Better Than Matrix Decomposition?