We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Last update: Nov 08, 2022

Related tags

Overview

ConTNet

Introduction

ConTNet (Convlution-Tranformer Network) is proposed mainly in response to the following two issues: (1) ConvNets lack a large receptive field, limiting the performance of ConvNets on downstream tasks. (2) Transformer-based model is not robust enough and requires special training settings or hundreds of millions of images as the pretrain dataset, thereby limiting their adoption. ConTNet combines convolution and transformer alternately, which is very robust and can be optimized like ResNet unlike the recently-proposed transformer-based models (e.g., ViT, DeiT) that are sensitive to hyper-parameters and need many tricks when trained from scratch on a midsize dataset (e.g., ImageNet).

Main Results on ImageNet

name	resolution	[email protected]	#params(M)	FLOPs(G)
Res-18	224x224	71.5	11.7	1.8
ConT-S	224x224	74.9	10.1	1.5
Res-50	224x224	77.1	25.6	4.0
ConT-M	224x224	77.6	19.2	3.1
Res-101	224x224	78.2	44.5	7.6
ConT-B	224x224	77.9	39.6	6.4
DeiT-Ti^*	224x224	72.2	5.7	1.3
ConT-Ti^*	224x224	74.9	5.8	0.8
Res-18^*	224x224	73.2	11.7	1.8
ConT-S^*	224x224	76.5	10.1	1.5
Res-50^*	224x224	78.6	25.6	4.0
DeiT-S^*	224x224	79.8	22.1	4.6
ConT-M^*	224x224	80.2	19.2	3.1
Res-101^*	224x224	80.0	44.5	7.6
DeiT-B^*	224x224	81.8	86.6	17.6
ConT-B^*	224x224	81.8	39.6	6.4

Note: ^* indicates training with strong augmentations.

Main Results on Downstream Tasks

Object detection results on COCO.

method	backbone	#params(M)	FLOPs(G)	AP	APs	APm	APl
RetinaNet	Res-50 ConTNet-M	32.0 27.0	235.6 217.2	36.5 37.9	20.4 23.0	40.3 40.6	48.1 50.4
FCOS	Res-50 ConTNet-M	32.2 27.2	242.9 228.4	38.7 40.8	22.9 25.1	42.5 44.6	50.1 53.0
faster rcnn	Res-50 ConTNet-M	41.5 36.6	241.0 225.6	37.4 40.0	21.2 25.4	41.0 43.0	48.1 52.0

Instance segmentation results on Cityscapes based on Mask-RCNN.

backbone	AP^bb	AP_s^bb	AP_m^bb	AP_l^bb	AP^mk	AP_s^mk	AP_m^mk	AP_l^mk
Res-50 ConT-M	38.2 40.5	21.9 25.1	40.9 44.4	49.5 52.7	34.7 38.1	18.3 20.9	37.4 41.0	47.2 50.3

Semantic segmentation results on cityscapes.

model	mIOU
PSP-Res50	77.12
PSP-ConTM	78.28

Bib Citing

@article{yan2021contnet,
    title={ConTNet: Why not use convolution and transformer at the same time?},
    author={Haotian Yan and Zhe Li and Weijian Li and Changhu Wang and Ming Wu and Chuang Zhang},
    year={2021},
    journal={arXiv preprint arXiv:2104.13497}
}

We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Related tags

Overview

ConTNet

Introduction

Main Results on ImageNet

Main Results on Downstream Tasks

Bib Citing

Owner

Code for ECCV 2020 paper "Contacts and Human Dynamics from Monocular Video".

An implementation for `Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction`

BMN: Boundary-Matching Network

social humanoid robots with GPGPU and IoT

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

QKeras: a quantization deep learning library for Tensorflow Keras

Supervised & unsupervised machine-learning techniques are applied to the database of weighted P4s which admit Calabi-Yau hypersurfaces.

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

Official implementation of "Synthetic Temporal Anomaly Guided End-to-End Video Anomaly Detection" (ICCV Workshops 2021: RSL-CV).

Attention-based Transformation from Latent Features to Point Clouds (AAAI 2022)

PyTea: PyTorch Tensor shape error analyzer

Text-to-Image generation

tf2onnx - Convert TensorFlow, Keras and Tflite models to ONNX.

Semantic Segmentation Suite in TensorFlow

DANet for Tabular data classification/ regression.

Implementation of our paper "Video Playback Rate Perception for Self-supervised Spatio-Temporal Representation Learning".

This repository contains the code used for the implementation of the paper "Probabilistic Regression with HuberDistributions"

Convolutional Neural Network for 3D meshes in PyTorch

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets