StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

Last update: Dec 21, 2022

Related tags

Overview

StackGAN

Tensorflow implementation for reproducing main results in the paper StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks by Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas.

Dependencies

python 2.7

TensorFlow 0.12

[Optional] Torch is needed, if use the pre-trained char-CNN-RNN text encoder.

[Optional] skip-thought is needed, if use the skip-thought text encoder.

In addition, please add the project folder to PYTHONPATH and pip install the following packages:

prettytensor
progressbar
python-dateutil
easydict
pandas
torchfile

Data

Download our preprocessed char-CNN-RNN text embeddings for birds and flowers and save them to Data/.

[Optional] Follow the instructions reedscot/icml2016 to download the pretrained char-CNN-RNN text encoders and extract text embeddings.

Download the birds and flowers image data. Extract them to Data/birds/ and Data/flowers/, respectively.
Preprocess images.

For birds: python misc/preprocess_birds.py
For flowers: python misc/preprocess_flowers.py

Training

The steps to train a StackGAN model on the CUB dataset using our preprocessed data for birds.
- Step 1: train Stage-I GAN (e.g., for 600 epochs) python stageI/run_exp.py --cfg stageI/cfg/birds.yml --gpu 0
- Step 2: train Stage-II GAN (e.g., for another 600 epochs) python stageII/run_exp.py --cfg stageII/cfg/birds.yml --gpu 1
Change birds.yml to flowers.yml to train a StackGAN model on Oxford-102 dataset using our preprocessed data for flowers.
*.yml files are example configuration files for training/testing our models.
If you want to try your own datasets, here are some good tips about how to train GAN. Also, we encourage to try different hyper-parameters and architectures, especially for more complex datasets.

Pretrained Model

StackGAN for birds trained from char-CNN-RNN text embeddings. Download and save it to models/.
StackGAN for flowers trained from char-CNN-RNN text embeddings. Download and save it to models/.
StackGAN for birds trained from skip-thought text embeddings. Download and save it to models/ (Just used the same setting as the char-CNN-RNN. We assume better results can be achieved by playing with the hyper-parameters).

Run Demos

Run sh demo/flowers_demo.sh to generate flower samples from sentences. The results will be saved to Data/flowers/example_captions/. (Need to download the char-CNN-RNN text encoder for flowers to models/text_encoder/. Note: this text encoder is provided by reedscot/icml2016).
Run sh demo/birds_demo.sh to generate bird samples from sentences. The results will be saved to Data/birds/example_captions/.(Need to download the char-CNN-RNN text encoder for birds to models/text_encoder/. Note: this text encoder is provided by reedscot/icml2016).
Run python demo/birds_skip_thought_demo.py --cfg demo/cfg/birds-skip-thought-demo.yml --gpu 2 to generate bird samples from sentences. The results will be saved to Data/birds/example_captions-skip-thought/. (Need to download vocabulary for skip-thought vectors to Data/skipthoughts/).

Examples for birds (char-CNN-RNN embeddings), more on youtube:

Examples for flowers (char-CNN-RNN embeddings), more on youtube:

Save your favorite pictures generated by our models since the randomness from noise z and conditioning augmentation makes them creative enough to generate objects with different poses and viewpoints from the same discription 😃

Citing StackGAN

If you find StackGAN useful in your research, please consider citing:

@inproceedings{han2017stackgan,
Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
Year = {2017},
booktitle = {{ICCV}},
}

Our follow-up work

References

Generative Adversarial Text-to-Image Synthesis Paper Code
Learning Deep Representations of Fine-grained Visual Descriptions Paper Code

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

Related tags

Overview

StackGAN

Dependencies

Citing StackGAN

Owner

Han Zhang

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

git《Joint Entity and Relation Extraction with Set Prediction Networks》(2020) GitHub:

Examples of how to create colorful, annotated equations in Latex using Tikz.

pixelNeRF: Neural Radiance Fields from One or Few Images

A keras implementation of ENet (abandoned for the foreseeable future)

ROS Basics and TurtleSim

SigOpt wrappers for scikit-learn methods

A spherical CNN for weather forecasting

Pytorch Implementation of rpautrat/SuperPoint

PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

[CVPR 2022] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

验证码识别深度学习 tensorflow 神经网络

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

机器学习、深度学习、自然语言处理等人工智能基础知识总结。

A program that uses computer vision to detect hand gestures, used for controlling movie players.

SIMULEVAL A General Evaluation Toolkit for Simultaneous Translation

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

PyTorch Code for "Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning"

Several simple examples for popular neural network toolkits calling custom CUDA operators.

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

Related tags

Overview

StackGAN

Dependencies

Citing StackGAN

Owner

Han Zhang

Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

git《Joint Entity and Relation Extraction with Set Prediction Networks》(2020) GitHub:

Examples of how to create colorful, annotated equations in Latex using Tikz.

pixelNeRF: Neural Radiance Fields from One or Few Images

A keras implementation of ENet (abandoned for the foreseeable future)

ROS Basics and TurtleSim

SigOpt wrappers for scikit-learn methods

A spherical CNN for weather forecasting

Pytorch Implementation of rpautrat/SuperPoint

PyTorch implementation DRO: Deep Recurrent Optimizer for Structure-from-Motion

[CVPR 2022] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

验证码识别 深度学习 tensorflow 神经网络

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

机器学习、深度学习、自然语言处理等人工智能基础知识总结。

A program that uses computer vision to detect hand gestures, used for controlling movie players.

SIMULEVAL A General Evaluation Toolkit for Simultaneous Translation

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

PyTorch Code for "Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning"

Several simple examples for popular neural network toolkits calling custom CUDA operators.

验证码识别深度学习 tensorflow 神经网络