[ACM MM 2021] TSA-Net: Tube Self-Attention Network for Action Quality Assessment

Last update: Dec 23, 2022

Related tags

Overview

Tube Self-Attention Network (TSA-Net)

This repository contains the PyTorch implementation for paper TSA-Net: Tube Self-Attention Network for Action Quality Assessment (ACM-MM'21 Oral)

[arXiv] [supp] [slides] [poster] [video]

If this repository is helpful to you, please star it. If you find our work useful in your research, please consider citing:

@inproceedings{TSA-Net,
  title={TSA-Net: Tube Self-Attention Network for Action Quality Assessment},
  author={Wang, Shunli and Yang, Dingkang and Zhai, Peng and Chen, Chixiao and Zhang, Lihua},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021},
  pages={4902–4910},
  numpages={9}
}

User Guide

In this repository, we open source the code of TSA-Net on FR-FS dataset. The initialization process is as follows:

# 1.Clone this repository
git clone https://github.com/Shunli-Wang/TSA-Net.git ./TSA-Net
cd ./TSA-Net

# 2.Create conda env
conda create -n TSA-Net python
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

# 3.Download pre-trained model and FRFS dataset. All download links are listed as follow.
# PATH/TO/rgb_i3d_pretrained.pt 
# PATH/TO/FRFS 

# 4.Create data dir
mkdir ./data && cd ./data
mv PATH/TO/rgb_i3d_pretrained.pt ./
ln -s PATH/TO/FRFS ./FRFS

After initialization, please check the data structure:

.
├── data
│   ├── FRFS -> PATH/TO/FRFS
│   └── rgb_i3d_pretrained.pt
├── dataset.py
├── train.py
├── test.py
...

Download links:

FR-FS Dataset: You can download the FR-FS dataset (About 2.5 G) from BaiduNetDisk [star] or Google Drive
rgb_i3d_pretrained.pt: I3D backbone pretrained on Kinetics (BaiduNetDisk [i3dm] or Google Drive) is used in our work, which is referenced from Gated-Spatio-Temporal-Energy-Graph.
Tracking boxes for AQA-7 & MTL-AQA: Due to the ongoing work, we are sorry that we can't share the source code of MTL-AQA and AQA-7. We provide the original tracking boxes of AQA and MTL-AQA at BaiduNetDisk [6v51] or Google Drive.

Training & Evaluation

We provide the training and testing code of TSA-Net and Plain-Net. The difference between the two is whether the TSA module exists. This option is controlled by --TSA item.

python train.py --gpu 0 --model_path TSA-USDL --TSA
python test.py --gpu 0 --pt_w Exp/TSA-USDL/best.pth --TSA

python train.py --gpu 0 --model_path USDL
python test.py --gpu 0 --pt_w Exp/USDL/best.pth

Acknowledgement

Our code is adapted from MUSDL. We are very grateful for their wonderful implementation. All tracking boxes in our project are generated by SiamMask. We also sincerely thank them for their contributions.

Contact

If you have any questions about our work, please contact [email protected].

[ACM MM 2021] TSA-Net: Tube Self-Attention Network for Action Quality Assessment

Related tags

Overview

Tube Self-Attention Network (TSA-Net)

User Guide

Training & Evaluation

Acknowledgement

Contact

Owner

ShunliWang

Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

Official codebase for Pretrained Transformers as Universal Computation Engines.

Minimal deep learning library written from scratch in Python, using NumPy/CuPy.

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Reinfore learning tool box, contains trpo, a3c algorithm for continous action space

Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

Improving Calibration for Long-Tailed Recognition (CVPR2021)

Code for the tech report Toward Training at ImageNet Scale with Differential Privacy

Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more"

Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

Official repository for the ISBI 2021 paper Transformer Assisted Convolutional Neural Network for Cell Instance Segmentation

Bald-to-Hairy Translation Using CycleGAN

Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters.

Streamlit app demonstrating an image browser for the Udacity self-driving-car dataset with realtime object detection using YOLO.

Simple PyTorch implementations of Badnets on MNIST and CIFAR10.

Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)