Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Last update: Sep 26, 2022

Related tags

Deep Learning TE-VQGAN

Overview

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Woncheol Shin¹, Gyubok Lee¹, Jiyoung Lee¹, Joonseok Lee^2,3, Edward Choi¹ | Paper

¹KAIST, ²Google Research, ³Seoul National University

Abstract

Recently, vector-quantized image modeling has demonstrated impressive performance on generation tasks such as text-to-image generation. However, we discover that the current image quantizers do not satisfy translation equivariance in the quantized space due to aliasing, degrading performance in the downstream text-to-image generation and image-to-text generation, even in simple experimental setups. Instead of focusing on anti-aliasing, we take a direct approach to encourage translation equivariance in the quantized space. In particular, we explore a desirable property of image quantizers, called 'Translation Equivariance in the Quantized Space' and propose a simple but effective way to achieve translation equivariance by regularizing orthogonality in the codebook embedding vectors. Using this method, we improve accuracy by +22% in text-to-image generation and +26% in image-to-text generation, outperforming the VQGAN.

Requirements

TBU

Download Dataset

TBU

Training TE-VQGAN (Stage 1)

TBU

Training Bi-directional Image-Text Generator (Stage 2)

TBU

Thanks to

The implementation of 'TE-VQGAN' and 'Bi-directional Image-Text Generator' is based on VQGAN and DALLE-pytorch. Thanks to all related works!

Citation

@misc{shin2021translationequivariant,
      title={Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation}, 
      author={Woncheol Shin and Gyubok Lee and Jiyoung Lee and Joonseok Lee and Edward Choi},
      year={2021},
      eprint={2112.00384},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Related tags

Overview

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Abstract

Requirements

Download Dataset

Training TE-VQGAN (Stage 1)

Training Bi-directional Image-Text Generator (Stage 2)

Thanks to

Citation

Owner

Woncheol Shin

🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

[NeurIPS 2021] Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples | ⛰️⚠️

Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

Computer-Vision-Paper-Reviews - Computer Vision Paper Reviews with Key Summary along Papers & Codes

Code of Periodic Activation Functions Induce Stationarity

Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow

Linear Variational State Space Filters

Potato Disease Classification - Training, Rest APIs, and Frontend to test.

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

TensorFlow implementation of "Variational Inference with Normalizing Flows"

ConvMAE: Masked Convolution Meets Masked Autoencoders

Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction

CBKH: The Cornell Biomedical Knowledge Hub

Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL)

a practicable framework used in Deep Learning. So far UDL only provide DCFNet implementation for the ICCV paper (Dynamic Cross Feature Fusion for Remote Sensing Pansharpening)

Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem

DRIFT is a tool for Diachronic Analysis of Scientific Literature.