Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Last update: Dec 07, 2022

Related tags

Overview

Relational Self-Attention: What's Missing in Attention for Video Understanding

This repository is the official implementation of "Relational Self-Attention: What's Missing in Attention for Video Understanding" by Manjin Kim*, Heeseung Kwon*, Chunyu Wang, Suha Kwak, and Minsu Cho (*equal contribution).

Requirements

Python: 3.7.9
Pytorch: 1.6.0
TorchVision: 0.2.1
Cuda: 10.1
Conda environment environment.yml

To install requirements:

    conda env create -f environment.yml
    conda activate rsa

Dataset Preparation

Download Something-Something v1 & v2 (SSv1 & SSv2) datasets and extract RGB frames. Download URLs: SSv1, SSv2
Make txt files that define training & validation splits. Each line in txt files is formatted as [video_path] [#frames] [class_label]. Please refer to any txt files in ./data directory.

Training

To train RSANet-R50 on SSv1 or SSv2 datasets in the paper, run this command:

    # For SSv1
    ./scripts/train_Something_v1.sh 
    
    
     
    # example: ./scripts/train_Something_v1.sh RSA_R50_SSV1_16frames 16
    
    # For SSv2
    ./scripts/train_Something_v2.sh 
      
      
       
    # example: ./scripts/train_Something_v2.sh RSA_R50_SSV2_16frames 16

Evaluation

To evaluate RSANet-R50 on SSv2 dataset in the paper, run:

    # For SSv1
    ./scripts/test_Something_v1.sh 
    
     
     
      
    # example: ./scripts/test_Something_v1.sh RSA_R50_SSV1_16frames resnet_rgb_model_best.pth.tar 16
    
    # For SSv2
    ./scripts/test_Something_v2.sh 
       
        
        
          # example: ./scripts/test_Something_v2.sh RSA_R50_SSV2_16frames resnet_rgb_model_best.pth.tar 16

Results

Our model achieves the following performance on Something-Something-V1 and Something-Something-V2:

model	dataset	frames	top-1 / top-5	logs	checkpoints
RSANet-R50	SSV1	16	54.0 % / 81.1 %	[log]	[checkpoint]
RSANet-R50	SSV2	16	66.0 % / 89.9 %	[log]	[checkpoint]

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Related tags

Overview

Relational Self-Attention: What's Missing in Attention for Video Understanding

Requirements

Dataset Preparation

Training

Evaluation

Results

Qualitative Results

Owner

mandos

Bringing Characters to Life with Computer Brains in Unity

We utilize deep reinforcement learning to obtain favorable trajectories for visual-inertial system calibration.

Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration.

PyTorch implementation of "Representing Shape Collections with Alignment-Aware Linear Models" paper.

PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

Python Fanduel API (2021) - Lineup Automation

Graph-total-spanning-trees - A Python script to get total number of Spanning Trees in a Graph

Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Jax/Flax implementation of Variational-DiffWave.

FewBit — a library for memory efficient training of large neural networks

SAN for Product Attributes Prediction

An unsupervised learning framework for depth and ego-motion estimation from monocular videos

An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Hyperparameter tuning for humans

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

DC3: A Learning Method for Optimization with Hard Constraints

Implementation of Convolutional LSTM in PyTorch.

Aggragrating Nested Transformer Official Jax Implementation