Generative Adversarial Text-to-Image Synthesis

Last update: Dec 31, 2022

Related tags

Deep Learning icml2016

Overview

###Generative Adversarial Text-to-Image Synthesis Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee

This is the code for our ICML 2016 paper on text-to-image synthesis using conditional GANs. You can use it to train and sample from text-to-image models. The code is adapted from the excellent dcgan.torch.

####Setup Instructions

You will need to install Torch, CuDNN, and the display package.

####How to train a text to image model:

Download the birds and flowers and COCO caption data in Torch format.
Download the birds and flowers and COCO image data.
Download the text encoders for birds and flowers and COCO descriptions.
Modify the CONFIG file to point to your data and text encoder paths.
Run one of the training scripts, e.g. ./scripts/train_cub.sh

####How to generate samples:

For flowers: ./scripts/demo_flowers.sh. Add text descriptions to scripts/flowers_queries.txt.
For birds: ./scripts/demo_cub.sh.
For COCO (more general images): ./scripts/demo_coco.sh.
An html file will be generated with the results:

####Pretrained models:

####How to train a text encoder from scratch:

You may want to do this if you have your own new dataset of text descriptions.
For flowers and birds: follow the instructions here.
For MS-COCO: ./scripts/train_coco_txt.sh.

####Citation

If you find this useful, please cite our work as follows:

@inproceedings{reed2016generative,
  title={Generative Adversarial Text-to-Image Synthesis},
  author={Scott Reed and Zeynep Akata and Xinchen Yan and Lajanugen Logeswaran and Bernt Schiele and Honglak Lee},
  booktitle={Proceedings of The 33rd International Conference on Machine Learning},
  year={2016}
}

Generative Adversarial Text-to-Image Synthesis

Related tags

Overview

Owner

Scott Ellison Reed

A free, multiplatform SDK for real-time facial motion capture using blendshapes, and rigid head pose in 3D space from any RGB camera, photo, or video.

Understanding the Generalization Benefit of Model Invariance from a Data Perspective

A stable algorithm for GAN training

Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

PyTorch implementation for ComboGAN

Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"

Leveraging OpenAI's Codex to solve cornerstone problems in Music

Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of images as "pixels"

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LiDAR R-CNN: An Efficient and Universal 3D Object Detector

Face Mask Detector by live camera using tensorflow-keras, openCV and Python

This repository is all about spending some time the with the original problem posed by Minsky and Papert

Quantile Regression DQN a Minimal Working Example, Distributional Reinforcement Learning with Quantile Regression

Management Dashboard for Torchserve

Data augmentation for NLP, accepted at EMNLP 2021 Findings

Yas CRNN model training - Yet Another Genshin Impact Scanner

My solution for the 7th place / 245 in the Umoja Hack 2022 challenge

Emblaze - Interactive Embedding Comparison

Individual Treatment Effect Estimation

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices