DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Last update: Dec 21, 2022

Related tags

Overview

DPT

This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and models for the following tasks:

Image Classification: Detailed instruction and information see classification/README.md.

Object Detection: Detailed instruction and information see detection/README.md.

The papar has been relased on [Arxiv].

Introduction

Deformable Patch (DePatch) is a plug-and-play module. It learns to adaptively split the images input patches with different positions and scales in a data-driven way, rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches.

In this repository, code and models for a Deformable Patch-based Transformer (DPT) are provided. As this field is developing rapidly, we are willing to see our DePatch applied to some other latest architectures and promote further research.

Main Results

Image Classification

Training commands and pretrained models are provided >>> here <<<.

Method	#Params (M)	FLOPs(G)	[email protected]
DPT-Tiny	15.2	2.1	77.4
DPT-Small	26.4	4.0	81.0
DPT-Medium	46.1	6.9	81.9

Object Detection

Coming soon.

Citation

@inproceedings{chenDPT21,
  title = {DPT: Deformable Patch-based Transformer for Visual Recognition},
  author = {Zhiyang Chen and Yousong Zhu and Chaoyang Zhao and Guosheng Hu and Wei Zeng and Jinqiao Wang and Ming Tang},
  booktitle={Proceedings of the ACM International Conference on Multimedia},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

Our implementation is mainly based on PVT. The CUDA operator is borrowed from Deformable-DETR. You may refer these repositories for further information.

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Related tags

Overview

DPT

Introduction

Main Results

Image Classification

Object Detection

Citation

License

Acknowledgement

Owner

CASIA-IVA-Lab

[ACMMM 2021, Oral] Code release for "Elastic Tactile Simulation Towards Tactile-Visual Perception"

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

Instance-conditional Knowledge Distillation for Object Detection

Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model

An Unsupervised Graph-based Toolbox for Fraud Detection

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

Simple Python project using Opencv and datetime package to recognise faces and log attendance data in a csv file.

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

Official PyTorch Implementation of "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting".

Callable PyTrees and filtered JIT/grad transformations => neural networks in JAX.

Voice Gender Recognition

MLP-Like Vision Permutator for Visual Recognition (PyTorch)

Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

A PyTorch implementation of deep-learning-based registration

Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"

Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"

Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.