End-to-end Temporal Action Detection with Transformer. [Under review]

Last update: Dec 25, 2022

Overview

TadTR: End-to-end Temporal Action Detection with Transformer

By Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Song Bai, Xiang Bai.

This repo holds the code for TadTR, described in the technical report: End-to-end temporal action detection with Transformer

Introduction

TadTR is an end-to-end Temporal Action Detection TRansformer. It has the following advantages over previous methods:

Simple. It adopts a set-prediction pipeline and achieves TAD with a single network. It does not require a separate proposal generation stage.
Flexible. It removes hand-crafted design such as anchor setting and NMS.
Sparse. It produces very sparse detections (e.g. 10 on ActivityNet), thus requiring lower computation cost.
Strong. As a self-contained temporal action detector, TadTR achieves state-of-the-art performance on HACS and THUMOS14. It is also much stronger than concurrent Transformer-based methods.

We're still improving TadTR. Stay tuned for the future version.

Updates

[2021.9.15] Update the performance on THUMOS14.

[2021.9.1] Add demo code.

TODOs

add model code
add inference code
add training code
support training/inference with video input

Main Results

HACS Segments

Method	Feature	[email protected]	[email protected]	[email protected]	Avg. mAP	Model
TadTR	I3D RGB	45.16	30.70	11.78	30.83	[OneDrive]

THUMOS14

Method	Feature	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	Avg. mAP	Model
TadTR	I3D 2stream	72.92	66.86	58.59	46.31	32.32	55.40	[OneDrive]
TadTR	TSN 2stream	64.24	58.34	50.01	40.79	29.07	48.49	[OneDrive]

ActivityNet-1.3

Method	Feature	[email protected]	[email protected]	[email protected]	Avg. mAP	Model
TadTR+BMN	TSN 2stream	50.51	35.35	8.18	34.55	[OneDrive]

Install

Requirements

Linux, CUDA>=9.2, GCC>=5.4
Python>=3.7
PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)
Other requirements
```
pip install -r requirements.txt
```

Compiling CUDA extensions

cd model/ops;

# If you have multiple installations of CUDA Toolkits, you'd better add a prefix
# CUDA_HOME=<your_cuda_toolkit_path> to specify the correct version. 
python setup.py build_ext --inplace

Run a quick test

python demo.py

Data Preparation

To be updated.

Training

Run the following command

bash scripts/train.sh DATASET

Testing

bash scripts/test.sh DATASET WEIGHTS

Acknowledgement

The code is based on the DETR and Deformable DETR. We also borrow the implementation of the RoIAlign1D from G-TAD. Thanks for their great works.

Citing

@article{liu2021end,
  title={End-to-end Temporal Action Detection with Transformer},
  author={Liu, Xiaolong and Wang, Qimeng and Hu, Yao and Tang, Xu and Bai, Song and Bai, Xiang},
  journal={arXiv preprint arXiv:2106.10271},
  year={2021}
}

Contact

For questions and suggestions, please contact Xiaolong Liu at "liuxl at hust dot edu dot cn".

End-to-end Temporal Action Detection with Transformer. [Under review]

Related tags

Overview

TadTR: End-to-end Temporal Action Detection with Transformer

Introduction

Updates

TODOs

Main Results

Install

Requirements

Compiling CUDA extensions

Run a quick test

Data Preparation

Training

Testing

Acknowledgement

Citing

Contact

Owner

Xiaolong Liu

CDGAN: Cyclic Discriminative Generative Adversarial Networks for Image-to-Image Transformation

Matlab Python Heuristic Battery Opt - SMOP conversion and manual conversion

Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

This repo generates the training data and the model for Morpheus-Deblend

Generic ecosystem for feature extraction from aerial and satellite imagery

[Pedestron] Generalizable Pedestrian Detection: The Elephant In The Room. @ CVPR2021

Christmas face app for Decathlon xmas coding party!

Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem

MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

Fiddle is a Python-first configuration library particularly well suited to ML applications.

PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.

Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

We will see a basic program that is basically a hint to brute force attack to crack passwords. In other words, we will make a program to Crack Any Password Using Python. Show some ❤️ by starring this repository!

View model summaries in PyTorch!

PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Code for the paper Task Agnostic Morphology Evolution.

A comprehensive and up-to-date developer education platform for Urbit.

The official github repository for Towards Continual Knowledge Learning of Language Models