Syntax-Aware Action Targeting for Video Captioning

Last update: Oct 13, 2022

Related tags

Overview

Syntax-Aware Action Targeting for Video Captioning

Code for SAAT from "Syntax-Aware Action Targeting for Video Captioning" (Accepted to CVPR 2020). The implementation is based on "Consensus-based Sequence Training for Video Captioning".

Dependencies

Python 3.6
Pytorch 1.1
CUDA 10.0
Microsoft COCO Caption Evaluation
CIDEr

(Check out the coco-caption and cider projects into your working directory)

Data

Data can be downloaded here (1.6GB). This folder contains:

input/msrvtt: annotatated captions (note that val_videodatainfo.json is a symbolic link to train_videodatainfo.json)
output/feature: extracted features of IRv2, C3D and Category embeddings
output/metadata: preprocessed annotations
output/model_svo/xe: model file and generated captions on test videos, the reported result can be reproduced by the model provided in this folder (CIDEr 49.1 for XE training)

Test

make -f SpecifiedMakefile test [options]

Please refer to the Makefile (and opts_svo.py file) for the set of available train/test options. For example, to reproduce the reported result

make -f Makefile_msrvtt_svo test GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG LAMBDA=20

Train

To train the model using XE loss

make -f Makefile_msrvtt_svo train GID=0 EXP_NAME=xe FEATS="irv2 c3d category" BFEATS="roi_feat roi_box" USE_RL=0 CST=0 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG MAX_EPOCH=100 LAMBDA=20

If you want to change the input features, modify the FEATS variable in above commands.

Citation

@InProceedings{Zheng_2020_CVPR,
author = {Zheng, Qi and Wang, Chaoyue and Tao, Dacheng},
title = {Syntax-Aware Action Targeting for Video Captioning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Acknowledgements

Pytorch implementation of CST
PyTorch implementation of SCST

Syntax-Aware Action Targeting for Video Captioning

Related tags

Overview

Syntax-Aware Action Targeting for Video Captioning

Dependencies

Data

Test

Train

Citation

Acknowledgements

Owner

Creating Multi Task Models With Keras

[ICCV 2021] Official PyTorch implementation for Deep Relational Metric Learning.

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

This repository collects 100 papers related to negative sampling methods.

A Joint Video and Image Encoder for End-to-End Retrieval

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Survival analysis (SA) is a well-known statistical technique for the study of temporal events.

Simple tutorials on Pytorch DDP training

3D Human Pose Machines with Self-supervised Learning

This is the code of using DQN to play Sekiro .

OpenCVのGrabCut()を利用したセマンティックセグメンテーション向けアノテーションツール(Annotation tool using GrabCut() of OpenCV. It can be used to create datasets for semantic segmentation.)

SimDeblur is a simple framework for image and video deblurring, implemented by PyTorch

Experiments for Fake News explainability project

Gender Classification Machine Learning Model using Sk-learn in Python with 97%+ accuracy and deployment

This repository is for EMNLP 2021 paper: It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpretation Data

Fully Convolutional DenseNets for semantic segmentation.

Inferred Model-based Fuzzer

Code for testing convergence rates of Lipschitz learning on graphs

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting