Vision Transformer with Deformable Attention

This repository contains the code for the paper Vision Transformer with Deformable Attention [arXiv].

Introduction

Deformable attention is proposed to model the relations among tokens effectively under the guidance of the important regions in the feature maps. This flexible scheme enables the self-attention module to focus on relevant regions and capture more informative features. On this basis, we present Deformable Attention Transformer (DAT), a general backbone model with deformable attention for both image classification and other dense prediction tasks.

Dependencies

NVIDIA GPU + CUDA 11.1
Python 3.8 (Recommend to use Anaconda)
PyTorch == 1.8.0
timm
einops
yacs
termcolor

TODO

Classification pretrained models.
Object Detection codebase & models.
Semantic Segmentation codebase & models.
CUDA operators to accelerate sampling operations.

Acknowledgement

This code is developed on the top of Swin Transformer, we thank to their efficient and neat codebase.

Citation

If you find our work is useful in your research, please consider citing:

@misc{xia2022vision,
      title={Vision Transformer with Deformable Attention}, 
      author={Zhuofan Xia and Xuran Pan and Shiji Song and Li Erran Li and Gao Huang},
      year={2022},
      eprint={2201.00520},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

[email protected]

Repository of Vision Transformer with Deformable Attention

Related tags

Overview

Vision Transformer with Deformable Attention

Introduction

Dependencies

TODO

Acknowledgement

Citation

Contact

Owner

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Testing and Estimation of structural breaks in Stata

Eye-Blink-Counter - Python based Computer Vision project which counts how many time a person blinks

[Link]mareteutral - pars tradg wth M []

Remote sensing change detection tool based on PaddlePaddle

AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Supplementary materials to "Spin-optomechanical quantum interface enabled by an ultrasmall mechanical and optical mode volume cavity" by H. Raniwala, S. Krastanov, M. Eichenfield, and D. R. Englund, 2022

nnFormer: Interleaved Transformer for Volumetric Segmentation

My take on a practical implementation of Linformer for Pytorch.

Official PyTorch implementation of MAAD: A Model and Dataset for Attended Awareness

Repo for FUZE project. I will also publish some Linux kernel LPE exploits for various real world kernel vulnerabilities here. the samples are uploaded for education purposes for red and blue teams.

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Deep Reinforcement Learning with pytorch & visdom

Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

Neuron class provides LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron - Extreme Learning Machine) neurons learned with Gradient descent or LeLevenberg–Marquardt algorithm

[CVPR 2021 Oral] Variational Relational Point Completion Network

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

VLG-Net: Video-Language Graph Matching Networks for Video Grounding