Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

Related tags

Deep Learningasg2cap
Overview

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

This repository contains PyTorch implementation of our paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (CVPR 2020).

Overview of ASG2Caption Model

Prerequisites

Python 3 and PyTorch 1.3.

# clone the repository
git clone https://github.com/cshizhe/asg2cap.git
cd asg2cap
# clone caption evaluation codes
git clone https://github.com/cshizhe/eval_cap.git
export PYTHONPATH=$(pwd):${PYTHONPATH}

Training & Inference

cd controlimcap/driver

# support caption models: [node, node.role, 
# rgcn, rgcn.flow, rgcn.memory, rgcn.flow.memory]
# see our paper for details
mtype=rgcn.flow.memory 

# setup config files
# you should modify data paths in configs/prepare_*_imgsg_config.py
python configs/prepare_coco_imgsg_config.py $mtype
resdir='' # copy the output string of the previous step

# training
python asg2caption.py $resdir/model.json $resdir/path.json $mtype --eval_loss --is_train --num_workers 8

# inference
python asg2caption.py $resdir/model.json $resdir/path.json $mtype --eval_set tst --num_workers 8

Datasets

Annotations

Annotations for MSCOCO and VisualGenome datasets can be download from GoogleDrive.

  • (Image, ASG, Caption) annotations: regionfiles/image_id.json
JSON Format:
{
	"region_id": {
		"objects":[
			{
	     		"object_id": int, 
	     		"name": str, 
	     		"attributes": [str],
				"x": int,
				"y": int, 
				"w": int, 
				"h": int
			}],
  	  "relationships": [
			{
				"relationship_id": int,
				"subject_id": int,
				"object_id": int,
				"name": str
			}],
  	  "phrase": str,
  }
}
  • vocabularies int2word.npy: [word] word2int.json: {word: int}

  • data splits: public_split directory trn_names.npy, val_names.npy, tst_names.npy

Features

Features for MSCOCO and VisualGenome datasets are available at BaiduNetdisk (code: 6q32).

We also provide pretrained models and codes to extract features for new images.

format: npy array, shape=(num_fts, dim_ft) corresponding to the order in data_split names

format: hdf5 files, "image_id".jpg.hdf5

key: 'image_id'.jpg

attrs: {"image_w": int, "image_h": int, "boxes": 4d array (x1, y1, x2, y2)}

Result Visualization

Examples

Citations

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@article{chen2020say,
  title={Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs},
  author={Chen, Shizhe and Jin, Qin and Wang, Peng and Wu, Qi},
  journal={CVPR},
  year={2020}
}

License

MIT License

Owner
Shizhe Chen
Shizhe Chen
使用OpenCV部署全景驾驶感知网络YOLOP,可同时处理交通目标检测、可驾驶区域分割、车道线检测,三项视觉感知任务,包含C++和Python两种版本的程序实现。本套程序只依赖opencv库就可以运行, 从而彻底摆脱对任何深度学习框架的依赖。

YOLOP-opencv-dnn 使用OpenCV部署全景驾驶感知网络YOLOP,可同时处理交通目标检测、可驾驶区域分割、车道线检测,三项视觉感知任务,依然是包含C++和Python两种版本的程序实现 onnx文件从百度云盘下载,链接:https://pan.baidu.com/s/1A_9cldU

178 Jan 07, 2023
Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer) Introduction By applying the

Son Gyo Jung 1 Jul 09, 2022
ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Introduction The official repository for "Mining Contextual Information Beyond Image for Semantic Segmentation". Our full code has been merged into ss

55 Nov 09, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

Swin Transformer for Semantic Segmentation of satellite images This repo contains the supported code and configuration files to reproduce semantic seg

23 Oct 10, 2022
This Deep Learning Model Predicts that from which disease you are suffering.

Deep-Learning-Project This Deep Learning Model Predicts that from which disease you are suffering. This Project Covers the Topics of Deep Learning Int

Jai Viral Doshi 0 Jan 20, 2022
A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

One-Stage Visual Grounding ***** New: Our recent work on One-stage VG is available at ReSC.***** A Fast and Accurate One-Stage Approach to Visual Grou

Zhengyuan Yang 118 Dec 05, 2022
Este conversor criará a medida exata para sua receita de capuccino gelado da grandiosa Rafaella Ballerini!

ConversorDeMedidas_CapuccinoGelado Este conversor criará a medida exata para sua receita de capuccino gelado da grandiosa Rafaella Ballerini! Requirem

Arthur Ottoni Ribeiro 48 Nov 15, 2022
Learning to Self-Train for Semi-Supervised Few-Shot

Learning to Self-Train for Semi-Supervised Few-Shot Classification This repository contains the TensorFlow implementation for NeurIPS 2019 Paper "Lear

86 Dec 29, 2022
Python Blood Vessel Topology Analysis

Python Blood Vessel Topology Analysis This repository is not being updated anymore. The new version of PyVesTo is called PyVaNe and is available at ht

6 Nov 15, 2022
Chunkmogrify: Real image inversion via Segments

Chunkmogrify: Real image inversion via Segments Teaser video with live editing sessions can be found here This code demonstrates the ideas discussed i

David Futschik 112 Jan 04, 2023
Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization

Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization Code for reproducing our results in the Head2Toe paper. Paper: arxiv.or

Google Research 62 Dec 12, 2022
A Model for Natural Language Attack on Text Classification and Inference

TextFooler A Model for Natural Language Attack on Text Classification and Inference This is the source code for the paper: Jin, Di, et al. "Is BERT Re

Di Jin 418 Dec 16, 2022
PyTorch version of the paper 'Enhanced Deep Residual Networks for Single Image Super-Resolution' (CVPRW 2017)

About PyTorch 1.2.0 Now the master branch supports PyTorch 1.2.0 by default. Due to the serious version problem (especially torch.utils.data.dataloade

Sanghyun Son 2.1k Dec 27, 2022
Code for our paper "Sematic Representation for Dialogue Modeling" in ACL2021

AMR-Dialogue An implementation for paper "Semantic Representation for Dialogue Modeling". You may find our paper here. Requirements python 3.6 pytorch

xfbai 45 Dec 26, 2022
Contains source code for the winning solution of the xView3 challenge

Winning Solution for xView3 Challenge This repository contains source code and pretrained models for my (Eugene Khvedchenya) solution to xView 3 Chall

Eugene Khvedchenya 51 Dec 30, 2022
Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral] Learning to Disambiguate Strongly In

Zicong Fan 40 Dec 22, 2022
Metrics to evaluate quality and efficacy of synthetic datasets.

An Open Source Project from the Data to AI Lab, at MIT Metrics for Synthetic Data Generation Projects Website: https://sdv.dev Documentation: https://

The Synthetic Data Vault Project 129 Jan 03, 2023
Learning Representational Invariances for Data-Efficient Action Recognition

Learning Representational Invariances for Data-Efficient Action Recognition Official PyTorch implementation for Learning Representational Invariances

Virginia Tech Vision and Learning Lab 27 Nov 22, 2022
TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning Authors: Yixuan Su, Fangyu Liu, Zaiqiao Meng, Lei Shu, Ehsan Shareghi, and Nig

Yixuan Su 79 Nov 04, 2022
Pipeline for employing a Lightweight deep learning models for LOW-power systems

PL-LOW A high-performance deep learning model lightweight pipeline that gradually lightens deep neural networks in order to utilize high-performance d

POSTECH Data Intelligence Lab 9 Aug 13, 2022