A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Last update: Sep 20, 2022

Related tags

Overview

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

Source: Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

A PyTorch implementation of TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? [1-2]. Unlike another Unofficial PyTorch implementation [3], our version is heavily borrowed from the official implementation [4] and TensorFlow implementation[5], and try to keep consistent with them.

Usage

You can access the TokenLearner and TokenLearnerModuleV11 class from the tokenlearner file. You can use this layer with a Vision Transformer, MLPMixer, or Video Vision Transformer as done in the paper.

import torch
from tokenlearner import TokenLearner

tklr = TokenLearner(in_channels=128, num_tokens=8, use_sum_pooling=False)

x = torch.ones(256, 32, 32, 128)  # [bs, h, w, c]
y1 = tklr(x)
print(y1.shape)  # [256, 8, 128]

You can also use TokenLearnerModuleV11, which aligns with the official implementation.

import torch
from tokenlearner import TokenLearnerModuleV11

tklr_v11 = TokenLearnerModuleV11(in_channels=128, num_tokens=8, num_groups=4, dropout_rate=0.)

tklr_v11.eval()  # control droput
x = torch.ones(256, 32, 32, 128)   # [bs, h, w, c]
y2 = tklr_v11(x)
print(y2.shape)  # [256, 8, 128]

References

[1] TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?; Ryoo et al.; arXiv 2021; https://arxiv.org/abs/2106.11297

[2] TokenLearner: Adaptive Space-Time Tokenization for Videos; Ryoo et al., NeurIPS 2021; https://openreview.net/forum?id=z-l1kpDXs88

[3] Unofficial PyTorch implementation

[4] official implementation

[5] TensorFlow implementation

A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Related tags

Overview

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

Usage

References

Owner

Caiyong Wang

code for Multi-scale Matching Networks for Semantic Correspondence, ICCV

This is a library for training and applying sparse fine-tunings with torch and transformers.

Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

Fast convergence of detr with spatially modulated co-attention

Neural Nano-Optics for High-quality Thin Lens Imaging

An interactive DNN Model deployed on web that predicts the chance of heart failure for a patient with an accuracy of 98%

Get the partition that a file belongs and the percentage of space that consumes

Implementation of Perceiver, General Perception with Iterative Attention in TensorFlow

IDRLnet, a Python toolbox for modeling and solving problems through Physics-Informed Neural Network (PINN) systematically.

A bunch of random PyTorch models using PyTorch's C++ frontend

The MATH Dataset

Action Recognition for Self-Driving Cars

The repository contain code for building compiler using puthon.

Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

Synthetic structured data generators

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

This Artificial Intelligence program can take a black and white/grayscale image and generate a realistic or plausible colorized version of the same picture.

A framework for GPU based high-performance medical image processing and visualization

Unofficial implementation of Fast-SCNN: Fast Semantic Segmentation Network

You Only Look One-level Feature (YOLOF), CVPR2021, Detectron2