Semi-Autoregressive Transformer for Image Captioning

Last update: Dec 09, 2022

Related tags

Deep Learning satic

Overview

Semi-Autoregressive Transformer for Image Captioning

Requirements

Python 3.6
Pytorch 1.6

Prepare data

Please use git clone --recurse-submodules to clone this repository and remember to follow initialization steps in coco-caption/README.md.
Download the preprocessd dataset from this link and extract it to data/.
Please follow this instruction to prepare the adaptive bottom-up features and place them under data/mscoco/. Please follow this instruction to prepare the features and place them under data/cocotest/ for online test evaluation.
Download part checkpoints from here and extract them to save/.

Offline Evaluation

To reproduce the results, such as SATIC(K=2, bw=1) after self-critical training, just run

python3 eval.py  --model  save/nsc-sat-2-from-nsc-seqkd/model-best.pth   --infos_path  save/nsc-sat-2-from-nsc-seqkd/infos_nsc-sat-2-from-nsc-seqkd-best.pkl    --batch_size  1   --beam_size   1   --id  nsc-sat-2-from-nsc-seqkd

Online Evaluation

Please first run

python3 eval_cocotest.py  --input_json  data/cocotest.json  --input_fc_dir data/cocotest/cocotest_bu_fc --input_att_dir  data/cocotest/cocotest_bu_att   --input_label_h5    data/cocotalk_label.h5  --num_images -1    --language_eval 0
--model  save/nsc-sat-4-from-nsc-seqkd/model-best.pth   --infos_path  save/nsc-sat-4-from-nsc-seqkd/infos_nsc-sat-4-from-nsc-seqkd-best.pkl    --batch_size  32   --beam_size   3   --id   captions_test2014_alg_results

and then follow the instruction to upload results.

Training

In the first training stage, such as SATIC(K=2) model with sequence-level distillation and weight initialization, run

python3  train.py   --noamopt --noamopt_warmup 20000 --label_smoothing 0.0  --seq_per_img 5 --batch_size 10 --beam_size 1 --learning_rate 5e-4 --num_layers 6 --input_encoding_size 512 --rnn_size 2048 --learning_rate_decay_start 0 --scheduled_sampling_start 0  --save_checkpoint_every 3000 --language_eval 1 --val_images_use 5000 --max_epochs 15    --input_label_h5   data/cocotalk_seq-kd-from-nsc-transformer-baseline-b5_label.h5   --checkpoint_path   save/sat-2-from-nsc-seqkd   --id   sat-2-from-nsc-seqkd   --K  2

Then in the second training stage, copy the above pretrained model first

cd save
./copy_model.sh  sat-2-from-nsc-seqkd    nsc-sat-2-from-nsc-seqkd
cd ..

and then run

python3  train.py    --seq_per_img 5 --batch_size 10 --beam_size 1 --learning_rate 1e-5 --num_layers 6 --input_encoding_size 512 --rnn_size 2048  --save_checkpoint_every 3000 --language_eval 1 --val_images_use 5000 --self_critical_after 10  --max_epochs    40   --input_label_h5    data/cocotalk_label.h5   --start_from   save/nsc-sat-2-from-nsc-seqkd   --checkpoint_path   save/nsc-sat-2-from-nsc-seqkd  --id  nsc-sat-2-from-nsc-seqkd    --K 2

Citation

@article{zhou2021semi,
  title={Semi-Autoregressive Transformer for Image Captioning},
  author={Zhou, Yuanen and Zhang, Yong and Hu, Zhenzhen and Wang, Meng},
  journal={arXiv preprint arXiv:2106.09436},
  year={2021}
}

Acknowledgements

This repository is built upon self-critical.pytorch. Thanks for the released code.

Semi-Autoregressive Transformer for Image Captioning

Related tags

Overview

Semi-Autoregressive Transformer for Image Captioning

Requirements

Prepare data

Offline Evaluation

Online Evaluation

Training

Citation

Acknowledgements

Owner

YE Zhou

Exponential Graph is Provably Efficient for Decentralized Deep Training

ConvMixer unofficial implementation

git《Joint Entity and Relation Extraction with Set Prediction Networks》(2020) GitHub:

Adaptive FNO transformer - official Pytorch implementation

A brand new hub for Scene Graph Generation methods based on MMdetection (2021). The pipeline of from detection, scene graph generation to downstream tasks (e.g., image cpationing) is supported. Pytorch version implementation of HetH (ECCV 2020) and TopicSG (ICCV 2021) is included.

[ICCV 2021] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain

Relative Positional Encoding for Transformers with Linear Complexity

Code for the paper "M2m: Imbalanced Classification via Major-to-minor Translation" (CVPR 2020)

Match SafeGraph POIs with Data collected through a cultural resource survey in Washington DC.

Repo público onde postarei meus estudos de Python, buscando aprender por meio do compartilhamento do aprendizado!

Code for intrusion detection system (IDS) development using CNN models and transfer learning

Tools for manipulating UVs in the Blender viewport.

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

This repository contains source code for the Situated Interactive Language Grounding (SILG) benchmark

The official code of Anisotropic Stroke Control for Multiple Artists Style Transfer

Demystifying How Self-Supervised Features Improve Training from Noisy Labels

Multi-Person Extreme Motion Prediction

NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions

Fast, differentiable sorting and ranking in PyTorch

Our implementation used for the MICCAI 2021 FLARE Challenge titled 'Efficient Multi-Organ Segmentation Using SpatialConfiguartion-Net with Low GPU Memory Requirements'.