Text-to-Image Translation (DALL-E) for TPU in Pytorch

Refactoring Taming Transformers and DALLE-pytorch for TPU VM with Pytorch Lightning

Requirements

pip install -r requirements.txt

Data Preparation

Place any image dataset with ImageNet-style directory structure (at least 1 subfolder) to fit the dataset into pytorch ImageFolder.

Training VQVAEs

You can easily test main.py with randomly generated fake data.

python train_vae.py --use_tpus --fake_data

For actual training provide specific directory for train_dir, val_dir, log_dir:

python train_vae.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results]

Training DALL-E

python train_dalle.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results] --vae_path [pretrained vae] --bpe_path [pretrained bpe(optional)]

TODO

Refactor Encoder and Decoder modules for better readability
Refactor VQVAE2
Add Net2Net Conditional Transformer for conditional image generation
Refactor, optimize, and merge DALL-E with Net2Net Conditional Transformer
Add Guided Diffusion + CLIP for image refinement
Add VAE converter for JAX to support dalle-mini
Add DALL-E colab notebook
Add RBGumbelQuantizer
Add HiT

ON-GOING

Test large dataset loading on TPU Pods
Change current DALL-E code to fully support latest updates from DALLE-pytorch

DONE

BibTeX

@misc{oord2018neural,
      title={Neural Discrete Representation Learning}, 
      author={Aaron van den Oord and Oriol Vinyals and Koray Kavukcuoglu},
      year={2018},
      eprint={1711.00937},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{razavi2019generating,
      title={Generating Diverse High-Fidelity Images with VQ-VAE-2}, 
      author={Ali Razavi and Aaron van den Oord and Oriol Vinyals},
      year={2019},
      eprint={1906.00446},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation}, 
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Refactoring dalle-pytorch and taming-transformers for TPU VM

Related tags

Overview

Text-to-Image Translation (DALL-E) for TPU in Pytorch

Requirements

Data Preparation

Training VQVAEs

Training DALL-E

TODO

ON-GOING

DONE

BibTeX

Owner

Kim, Taehoon

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)

Zalo AI challenge 2021 task hum to song

Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

Code artifacts for the submission "Mind the Gap! A Study on the Transferability of Virtual vs Physical-world Testing of Autonomous Driving Systems"

Label Mask for Multi-label Classification

[RSS 2021] An End-to-End Differentiable Framework for Contact-Aware Robot Design

The repository offers the official implementation of our BMVC 2021 paper in PyTorch.

Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Multi-label classification of retinal disorders

Adaptive Attention Span for Reinforcement Learning

Mail classification with tensorflow and MS Exchange Server (ham or spam).

PyTorch implementation for 3D human pose estimation

Predict Breast Cancer Wisconsin (Diagnostic) using Naive Bayes

RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection

An implementation of a sequence to sequence neural network using an encoder-decoder

git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]

VQGAN+CLIP Colab Notebook with user-friendly interface.

Simple-Neural-Network From Scratch in Python

Implementation of Kalman Filter in Python