Semi-Autoregressive Transformer for Image Captioning

Related tags

Deep Learningsatic
Overview

Semi-Autoregressive Transformer for Image Captioning

Requirements

  • Python 3.6
  • Pytorch 1.6

Prepare data

  1. Please use git clone --recurse-submodules to clone this repository and remember to follow initialization steps in coco-caption/README.md.
  2. Download the preprocessd dataset from this link and extract it to data/.
  3. Please follow this instruction to prepare the adaptive bottom-up features and place them under data/mscoco/. Please follow this instruction to prepare the features and place them under data/cocotest/ for online test evaluation.
  4. Download part checkpoints from here and extract them to save/.

Offline Evaluation

To reproduce the results, such as SATIC(K=2, bw=1) after self-critical training, just run

python3 eval.py  --model  save/nsc-sat-2-from-nsc-seqkd/model-best.pth   --infos_path  save/nsc-sat-2-from-nsc-seqkd/infos_nsc-sat-2-from-nsc-seqkd-best.pkl    --batch_size  1   --beam_size   1   --id  nsc-sat-2-from-nsc-seqkd   

Online Evaluation

Please first run

python3 eval_cocotest.py  --input_json  data/cocotest.json  --input_fc_dir data/cocotest/cocotest_bu_fc --input_att_dir  data/cocotest/cocotest_bu_att   --input_label_h5    data/cocotalk_label.h5  --num_images -1    --language_eval 0
--model  save/nsc-sat-4-from-nsc-seqkd/model-best.pth   --infos_path  save/nsc-sat-4-from-nsc-seqkd/infos_nsc-sat-4-from-nsc-seqkd-best.pkl    --batch_size  32   --beam_size   3   --id   captions_test2014_alg_results  

and then follow the instruction to upload results.

Training

  1. In the first training stage, such as SATIC(K=2) model with sequence-level distillation and weight initialization, run
python3  train.py   --noamopt --noamopt_warmup 20000 --label_smoothing 0.0  --seq_per_img 5 --batch_size 10 --beam_size 1 --learning_rate 5e-4 --num_layers 6 --input_encoding_size 512 --rnn_size 2048 --learning_rate_decay_start 0 --scheduled_sampling_start 0  --save_checkpoint_every 3000 --language_eval 1 --val_images_use 5000 --max_epochs 15    --input_label_h5   data/cocotalk_seq-kd-from-nsc-transformer-baseline-b5_label.h5   --checkpoint_path   save/sat-2-from-nsc-seqkd   --id   sat-2-from-nsc-seqkd   --K  2
  1. Then in the second training stage, copy the above pretrained model first
cd save
./copy_model.sh  sat-2-from-nsc-seqkd    nsc-sat-2-from-nsc-seqkd
cd ..

and then run

python3  train.py    --seq_per_img 5 --batch_size 10 --beam_size 1 --learning_rate 1e-5 --num_layers 6 --input_encoding_size 512 --rnn_size 2048  --save_checkpoint_every 3000 --language_eval 1 --val_images_use 5000 --self_critical_after 10  --max_epochs    40   --input_label_h5    data/cocotalk_label.h5   --start_from   save/nsc-sat-2-from-nsc-seqkd   --checkpoint_path   save/nsc-sat-2-from-nsc-seqkd  --id  nsc-sat-2-from-nsc-seqkd    --K 2

Citation

@article{zhou2021semi,
  title={Semi-Autoregressive Transformer for Image Captioning},
  author={Zhou, Yuanen and Zhang, Yong and Hu, Zhenzhen and Wang, Meng},
  journal={arXiv preprint arXiv:2106.09436},
  year={2021}
}

Acknowledgements

This repository is built upon self-critical.pytorch. Thanks for the released code.

Owner
YE Zhou
YE Zhou
Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Multi-modal Interaction Graph Convolutioal Network for Temporal Language Localization in Videos Official implementation for Multi-Modal Interaction Gr

Zongmeng Zhang 15 Oct 18, 2022
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Evaluating the Factual Consistency of Abstractive Text Summarization Authors: Wojciech Kryściński, Bryan McCann, Caiming Xiong, and Richard Socher Int

Salesforce 165 Dec 21, 2022
PyTorch implementation of "PatchGame: Learning to Signal Mid-level Patches in Referential Games" to appear in NeurIPS 2021

PatchGame: Learning to Signal Mid-level Patches in Referential Games This repository is the official implementation of the paper - "PatchGame: Learnin

Kamal Gupta 22 Mar 16, 2022
PAthological QUpath Obsession - QuPath and Python conversations

PAQUO: PAthological QUpath Obsession Welcome to paquo 👋 , a library for interacting with QuPath from Python. paquo's goal is to provide a pythonic in

Bayer AG 60 Dec 31, 2022
NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs.

NAS-HPO-Bench-II API Overview NAS-HPO-Bench-II is the first benchmark dataset for joint optimization of CNN and training HPs. It helps a fair and low-

yoichi hirose 8 Nov 21, 2022
Exponential Graph is Provably Efficient for Decentralized Deep Training

Exponential Graph is Provably Efficient for Decentralized Deep Training This code repository is for the paper Exponential Graph is Provably Efficient

3 Apr 20, 2022
[CVPR 2021] Forecasting the panoptic segmentation of future video frames

Panoptic Segmentation Forecasting Colin Graber, Grace Tsai, Michael Firman, Gabriel Brostow, Alexander Schwing - CVPR 2021 [Link to paper] We propose

Niantic Labs 44 Nov 29, 2022
Code accompanying the paper "ProxyFL: Decentralized Federated Learning through Proxy Model Sharing"

ProxyFL Code accompanying the paper "ProxyFL: Decentralized Federated Learning through Proxy Model Sharing" Authors: Shivam Kalra*, Junfeng Wen*, Jess

Layer6 Labs 14 Dec 06, 2022
pytorch bert intent classification and slot filling

pytorch_bert_intent_classification_and_slot_filling 基于pytorch的中文意图识别和槽位填充 说明 基本思路就是:分类+序列标注(命名实体识别)同时训练。 使用的预训练模型:hugging face上的chinese-bert-wwm-ext 依

西西嘛呦 33 Dec 15, 2022
Large-scale language modeling tutorials with PyTorch

Large-scale language modeling tutorials with PyTorch 안녕하세요. 저는 TUNiB에서 머신러닝 엔지니어로 근무 중인 고현웅입니다. 이 자료는 대규모 언어모델 개발에 필요한 여러가지 기술들을 소개드리기 위해 마련하였으며 기본적으로

TUNiB 172 Dec 29, 2022
Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

points2d_projection_mesh Input 2D points (e.g. facial landmarks) on an image Camera parameters (extrinsic and intrinsic) of the image Aligned 3D mesh

5 Dec 08, 2022
This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

Trajectory Prediction using Equivariant Continuous Convolution (ECCO) This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivar

Spatiotemporal Machine Learning 45 Jul 22, 2022
GNEE - GAT Neural Event Embeddings

GNEE - GAT Neural Event Embeddings This repository contains source code for the GNEE (GAT Neural Event Embeddings) method introduced in the paper: "Se

João Pedro Rodrigues Mattos 0 Sep 15, 2021
[ECCV 2020] XingGAN for Person Image Generation

Contents XingGAN or CrossingGAN Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Evaluation Acknowl

Hao Tang 218 Oct 29, 2022
《Deep Single Portrait Image Relighting》(ICCV 2019)

Ratio Image Based Rendering for Deep Single-Image Portrait Relighting [Project Page] This is part of the Deep Portrait Relighting project. If you find

62 Dec 21, 2022
Depression Asisstant GDSC Challenge Solution

Depression Asisstant can help you give solution. Please using Python version 3.9.5 for contribute.

Ananda Rauf 1 Jan 30, 2022
Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

MusCaps: Generating Captions for Music Audio Ilaria Manco1 2, Emmanouil Benetos1, Elio Quinton2, Gyorgy Fazekas1 1 Queen Mary University of London, 2

Ilaria Manco 57 Dec 07, 2022
PyTorch implementation of Rethinking Positional Encoding in Language Pre-training

TUPE PyTorch implementation of Rethinking Positional Encoding in Language Pre-training. Quickstart Clone this repository. git clone https://github.com

Jake Tae 5 Jan 27, 2022
Trafffic prediction analysis using hybrid models - Machine Learning

Hybrid Machine learning Model Clone the Repository Create a new Directory as assests and download the model from the below link Model Link To Start th

1 Feb 08, 2022
Evaluation and Benchmarking of Speech Super-resolution Methods

Speech Super-resolution Evaluation and Benchmarking What this repo do: A toolbox for the evaluation of speech super-resolution algorithms. Unify the e

Haohe Liu (刘濠赫) 84 Dec 20, 2022