Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Last update: Nov 21, 2022

Overview

Diverse Image Captioning with Context-Object Split Latent Spaces

This repository is the PyTorch implementation of the paper:

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

We additionally include evaluation code from Luo et al. in the folder GoogleConceptualCaptioning , which has been patched for compatibility.

Requirements

The following code is written in Python 3.6.10 and CUDA 9.0.

Requirements:

torch 1.1.0
torchvision 0.3.0
nltk 3.5
inflect 4.1.0
tqdm 4.46.0
sklearn 0.0
h5py 2.10.0

To install requirements:

conda config --add channels pytorch
conda config --add channels anaconda
conda config --add channels conda-forge
conda config --add channels conda-forge/label/cf202003
conda create -n <environment_name> --file requirements.txt
conda activate <environment_name>

Preprocessed data

The dataset used in this project for assessing accuracy and diversity is COCO 2014 (m-RNN split). The full dataset is available here.

We use the Faster R-CNN features for images similar to Anderson et al.. We additionally require "classes"/"scores" fields detected for image regions. The classes correspond to Visual Genome.

Download instructions

Preprocessed training data is available here as hdf5 files. The provided hdf5 files contain the following fields:

image_id: ID of the COCO image
num_boxes: The proposal regions detected from Faster R-CNN
features: ResNet-101 features of the extracted regions
classes: Visual genome classes of the extracted regions
scores: Scores of the Visual genome classes of the extracted regions

Note that the ["image_id","num_boxes","features"] fields are identical to Anderson et al.

Create a folder named coco and download the preprocessed training and test datasets from the coco folder in the drive link above as follows (it is also possible to directly download the entire coco folder from the drive link):

Download the following files for training on COCO 2014 (m-RNN split):

coco/coco_train_2014_adaptive_withclasses.h5
coco/coco_val_2014_adaptive_withclasses.h5
coco/coco_val_mRNN.txt
coco/coco_test_mRNN.txt

Download the following files for training on held-out COCO (novel object captioning):

coco/coco_train_2014_noc_adaptive_withclasses.h5
coco/coco_train_extra_2014_noc_adaptive_withclasses.h5

Download the following files for testing on held-out COCO (novel object captioning):

coco/coco_test_2014_noc_adaptive_withclasses.h5

Download the (caption) annotation files and place them in a subdirectory coco/annotations (mirroring the Google drive folder structure)

coco/annotations/captions_train2014.json
coco/annotations/captions_val2014.json

Download the following files from the drive link in a seperate folder data (outside coco). These files contain the contextual neighbours for pseudo supervision:

data/nn_final.pkl
data/nn_noc.pkl

For running the train/test scripts (described in the following) "pathToData"/"nn_dict_path" in params.json and params_noc.json needs to be set to the coco/data folder created above.

Verify Folder Structure after Download

The folder structure of coco after data download should be as follows,

coco
 - annotations
   - captions_train2014.json
   - captions_val2014.json
 - coco_val_mRNN.txt
 - coco_test_mRNN.txt
 - coco_train_2014_adaptive_withclasses.h5
 - coco_val_2014_adaptive_withclasses.h5
 - coco_train_2014_noc_adaptive_withclasses.h5
 - coco_train_extra_2014_noc_adaptive_withclasses.h5
 - coco_test_2014_noc_adaptive_withclasses.h5
data
 - coco_classname.txt
 - visual_genome_classes.txt
 - vocab_coco_full.pkl
 - nn_final.pkl
 - nn_noc.pkl

Training

Please follow the following instructions for training:

Set hyperparameters for training in params.json and params_noc.json.
Train a model on COCO 2014 for captioning,

   	python ./scripts/train.py

Train a model for diverse novel object captioning,

   	python ./scripts/train_noc.py

Please note that the data folder provides the required vocabulary.

Memory requirements

The models were trained on a single nvidia V100 GPU with 32 GB memory. 16 GB is sufficient for training a single run.

Pre-trained models and evaluation

We provide pre-trained models for both captioning on COCO 2014 (mRNN split) and novel object captioning. Please follow the following steps:

Download the pre-trained models from here to the ckpts folder.
For evaluation of oracle scores and diversity, we follow Luo et al.. In the folder GoogleConceptualCaptioning download the cider and in the cococaption folder run the download scripts,

   	./GoogleConceptualCaptioning/cococaption/get_google_word2vec_model.sh
   	./GoogleConceptualCaptioning/cococaption/get_stanford_models.sh
   	python ./scripts/eval.py

For diversity evaluation create the required numpy file for consensus re-ranking using,

   	python ./scripts/eval_diversity.py

For consensus re-ranking follow the steps here. To obtain the final diversity scores, follow the instructions of DiversityMetrics. Convert the numpy file to required json format and run the script evalscripts.py

To evaluate the F1 score for novel object captioning,

   	python ./scripts/eval_noc.py

Results

Oracle evaluation on the COCO dataset

	B4	B3	B2	B1	CIDEr	METEOR	ROUGE	SPICE
COS-CVAE	0.633	0.739	0.842	0.942	1.893	0.450	0.770	0.339

Diversity evaluation on the COCO dataset

	Unique	Novel	mBLEU	Div-1	Div-2
COS-CVAE	96.3	4404	0.53	0.39	0.57

F1-score evaluation on the held-out COCO dataset

	bottle	bus	couch	microwave	pizza	racket	suitcase	zebra	average
COS-CVAE	35.4	83.6	53.8	63.2	86.7	69.5	46.1	81.7	65.0

Bibtex

@inproceedings{coscvae20neurips,
  title     = {Diverse Image Captioning with Context-Object Split Latent Spaces},
  author    = {Mahajan, Shweta and Roth, Stefan},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year = {2020}
}

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Related tags

Overview

Diverse Image Captioning with Context-Object Split Latent Spaces

Requirements

Preprocessed data

Download instructions

Verify Folder Structure after Download

Training

Memory requirements

Pre-trained models and evaluation

Results

Oracle evaluation on the COCO dataset

Diversity evaluation on the COCO dataset

F1-score evaluation on the held-out COCO dataset

Bibtex

Owner

Visual Inference Lab @TU Darmstadt

Ray tracing of a Schwarzschild black hole written entirely in TensorFlow.

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

This repository provides a PyTorch implementation and model weights for HCSC (Hierarchical Contrastive Selective Coding)

SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

Vision transformers (ViTs) have found only limited practical use in processing images

OMAMO: orthology-based model organism selection

🚀 An end-to-end ML applications using PyTorch, W&B, FastAPI, Docker, Streamlit and Heroku

Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

NeurIPS-2021: Neural Auto-Curricula in Two-Player Zero-Sum Games.

(NeurIPS '21 Spotlight) IQ-Learn: Inverse Q-Learning for Imitation

Runtime type annotations for the shape, dtype etc. of PyTorch Tensors.

PyTorch implementation of image classification models for CIFAR-10/CIFAR-100/MNIST/FashionMNIST/Kuzushiji-MNIST/ImageNet

Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

Repo for our ICML21 paper Unsupervised Learning of Visual 3D Keypoints for Control

Robotics environments

OpenIPDM is a MATLAB open-source platform that stands for infrastructures probabilistic deterioration model

Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

Emotion classification of online comments based on RNN

Implements Stacked-RNN in numpy and torch with manual forward and backward functions