Official Implementation of SWAGAN: A Style-based Wavelet-driven Generative Model

Last update: Dec 06, 2022

Related tags

Overview

Official Implementation of SWAGAN: A Style-based Wavelet-driven Generative Model

SWAGAN: A Style-based Wavelet-driven Generative Model
Rinon Gal, Dana Cohen Hochberg, Amit Bermano, Daniel Cohen-Or

Abstract:
In recent years, considerable progress has been made in the visual quality of Generative Adversarial Networks (GANs). Even so, these networks still suffer from degradation in quality for high-frequency content, stemming from a spectrally biased architecture, and similarly unfavorable loss functions. To address this issue, we present a novel general-purpose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain. SWAGAN incorporates wavelets throughout its generator and discriminator architectures, enforcing a frequency-aware latent representation at every step of the way. This approach, designed to directly tackle the spectral bias of neural networks, yields an improvement in the ability to generate medium and high frequency content, including structures which other networks fail to learn. We demonstrate the advantage of our method by integrating it into the SyleGAN2 framework, and verifying that content generation in the wavelet domain leads to more realistic high-frequency content, even when trained for fewer iterations. Furthermore, we verify that our model's latent space retains the qualities that allow StyleGAN to serve as a basis for a multitude of editing tasks, and show that our frequency-aware approach also induces improved high-frequency performance in downstream tasks.

Requirements

Our code borrows heavily from the original StyleGAN2 implementation. The list of requirements is thus identical:

64-bit Python 3.6 installation. We recommend Anaconda3 with numpy 1.14.3 or newer.
TensorFlow 1.14 or 1.15 with GPU support. The code does not support TensorFlow 2.0.
On Windows, you need to use TensorFlow 1.14 — TensorFlow 1.15 will not work.
One or more high-end NVIDIA GPUs, NVIDIA drivers, CUDA 10.0 toolkit and cuDNN 7.5.

Using pre-trained networks

Pre-trained networks are stored as *.pkl files.

Paper models can be downloaded here. More models will be made available soon.

To generate images with a given model, use:

# Single latent generation
python run_generator.py generate-images --network=/path/to/model.pkl \
  --seeds=6600-6625 --truncation-psi=1.0 --result-dir /path/to/output/

# Style mixing
python run_generator.py style-mixing-example --network=/path/to/model.pkl \
  --row-seeds=85,100,75,458,1500 --col-seeds=55,821,1789,293 \
  --truncation-psi=1.0 --result-dir /path/to/output/

Training networks

To train a model, run:

python run_training.py --data-dir=/path/to/data --config=config-f-Gwavelets-Dwavelets \ 
  --dataset=data_folder_name --mirror-augment=true

For other configurations, see run_training.py.

Evaluation metrics

FID metrics can be computed using the original StyleGAN2 scripts:

python run_metrics.py --data-dir=/path/to/data --network=/path/to/model.pkl \
  --metrics=fid50k --dataset=data_folder_name --mirror-augment=true

Spectrum Gap plots:

Coming soon.

License

The original StyleGAN2 implementation and this derivative work are available under the Nvidia Source Code License-NC. To view a copy of this license, visit https://nvlabs.github.io/stylegan2/license.html

Citation

@article{gal2021swagan,
author = {Gal, Rinon and Hochberg, Dana Cohen and Bermano, Amit and Cohen-Or, Daniel},
title = {SWAGAN: A Style-Based Wavelet-Driven Generative Model},
year = {2021},
issue_date = {August 2021},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {40},
number = {4},
issn = {0730-0301},
url = {https://doi.org/10.1145/3450626.3459836},
doi = {10.1145/3450626.3459836},
journal = {ACM Trans. Graph.},
month = jul,
articleno = {134},
numpages = {11},
keywords = {StyleGAN, wavelet decomposition, generative adversarial networks}
}

If you use our work, please consider citing StyleGAN2 as well:

@article{Karras2019stylegan2,
  title   = {Analyzing and Improving the Image Quality of {StyleGAN}},
  author  = {Tero Karras and Samuli Laine and Miika Aittala and Janne Hellsten and Jaakko Lehtinen and Timo Aila},
  journal = {CoRR},
  volume  = {abs/1912.04958},
  year    = {2019},
}

Acknowledgements

We thank Ron Mokady for their comments on an earlier version of the manuscript. We also want to thank the anonymous reviewers for identifying and assisting in the correction of flaw in an earlier version of our paper.

Official Implementation of SWAGAN: A Style-based Wavelet-driven Generative Model

Related tags

Overview

Official Implementation of SWAGAN: A Style-based Wavelet-driven Generative Model

Requirements

Using pre-trained networks

Training networks

Evaluation metrics

FID metrics can be computed using the original StyleGAN2 scripts:

Spectrum Gap plots:

License

Citation

Acknowledgements

Owner

PyTorch implementation of the cross-modality generative model that synthesizes dance from music.

python 93% acc. CNN Dogs Vs Cats ( Pytorch )

GNPy: Optical Route Planning and DWDM Network Optimization

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

Interactive dimensionality reduction for large datasets

Unbiased Learning To Rank Algorithms (ULTRA)

YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving (ICCV 2021)

It's final year project of Diploma Engineering. This project is based on Computer Vision.

Animal Sound Classification (Cats Vrs Dogs Audio Sentiment Classification)

Towards Representation Learning for Atmospheric Dynamics (AtmoDist)

Node-level Graph Regression with Deep Gaussian Process Models

Curating a dataset for bioimage transfer learning

A flexible ML framework built to simplify medical image reconstruction and analysis experimentation.

一个多模态内容理解算法框架，其中包含数据处理、预训练模型、常见模型以及模型加速等模块。

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

A bunch of random PyTorch models using PyTorch's C++ frontend

POCO: Point Convolution for Surface Reconstruction

Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System