Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

Last update: Dec 28, 2022

Overview

NÜWA - Pytorch (wip)

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch. This repository will be populated in the case that Microsoft does not open source the code by end of December. It may also contain an extension into video and audio, using a dual decoder approach.

DeepReader

Citations

@misc{wu2021nuwa,
    title   = {N\"UWA: Visual Synthesis Pre-training for Neural visUal World creAtion}, 
    author  = {Chenfei Wu and Jian Liang and Lei Ji and Fan Yang and Yuejian Fang and Daxin Jiang and Nan Duan},
    year    = {2021},
    eprint  = {2111.12417},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Comments

Question about generated videos?

There are a lot of negative numbers and very small decimals (like 5e-1). But the loss degrades normally when training. Is that a normal situation? How can I make the result visible?

opened by Fitzwong 0
Why the video does not pass through the encoder?

Hi! lucidrains. Thanks for providing a great repo which is convenient to understand the NUWA paper.
I have a question as follows: In the NUWA paper, we can see that the inputs of the Encoder are caption tokens (caption condition) and the video tokens (3DNA condition). So, in my eye, the video tokens sequence should fully self-attend in the Encoder, right? And then, the outputs condition the Decoder. The Decoder provided by you is as following. . It has causal self-attention and text-condition as we expected. But from the definition in paper, the condition contains the text-condition and 3DNA condition, and these two condition the Decoder. Is my opinion right? I am just curious about the condition in the NUWA paper. The Encoder in your repo is only the Text-Encoder, but the video does not pass through the encoder to condition the Encoder.

Looking forward to your reply! Thanks!

opened by Wang-Xiaodong1899 0
Questions about function forward() in NUWA please.
I'm confused me that, in function forward() of class NUWA, the ground-truth video is fed to transformer and calculate the output video, which is different from function generate().

frame_embeddings = self.video_transformer( frame_embeddings, # calculated from ground-truth video context = text_embeds, context_mask = text_mask )

So when training NUWA, the loss comes from logits. But the logits are not only from text, but ground-truth video (only one transformer layer, different from the auto-regressive model in generate function). Is that some kind of cheating when training? Or should I generate logits in the same way as in generate(), and then calculate loss to train?
opened by Fitzwong 1
Type of dataset for training VQ-GAN

Hi,

First, thanks a lot for the amazing work! I have one question regarding the training of the VQ-GAN, do you recommend training it on a dataset similar to the dataset the nuwa model will be trained? What I mean is, if I want to train nuwa to generate sport videos based on text, do I need to also train the VQ-GAN on a sport dataset?

Thanks a lot

opened by antonibigata 0
Pseudocode for 3DNA?

me no comprendai le complex einops 😢

Can someone give the 3DNA pseudocode to illustrate what's going on 🤗

(Also how did lucidrains bang out thousands of lines of code in a few weeks - is he confirmed to be human? 🤔)

opened by neel04 4

Releases(0.7.7a)

0.7.7a(Aug 14, 2022)

null
Source code(tar.gz)
Source code(zip)
0.7.7(Aug 14, 2022)

null
Source code(tar.gz)
Source code(zip)
0.7.6(Apr 28, 2022)

Source code(tar.gz)
Source code(zip)
0.7.5(Apr 28, 2022)

Source code(tar.gz)
Source code(zip)
0.7.4(Apr 27, 2022)

Source code(tar.gz)
Source code(zip)
0.7.3(Apr 22, 2022)

Source code(tar.gz)
Source code(zip)
0.7.2(Apr 7, 2022)

Source code(tar.gz)
Source code(zip)
0.7.1(Mar 24, 2022)

Source code(tar.gz)
Source code(zip)
0.7.0(Mar 24, 2022)

Source code(tar.gz)
Source code(zip)
0.6.4(Mar 15, 2022)

Source code(tar.gz)
Source code(zip)
0.6.3(Mar 15, 2022)

Source code(tar.gz)
Source code(zip)
0.6.2(Mar 15, 2022)

Source code(tar.gz)
Source code(zip)
0.6.1(Mar 15, 2022)

Source code(tar.gz)
Source code(zip)
0.6.0(Mar 15, 2022)

Source code(tar.gz)
Source code(zip)
0.5.15(Mar 12, 2022)

Source code(tar.gz)
Source code(zip)
0.5.14(Mar 12, 2022)

Source code(tar.gz)
Source code(zip)
0.5.12(Mar 12, 2022)

Source code(tar.gz)
Source code(zip)
0.5.11(Mar 12, 2022)

Source code(tar.gz)
Source code(zip)
0.5.10(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.9(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.8(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.7(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.6(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.5(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.4(Mar 11, 2022)

Source code(tar.gz)
Source code(zip)
0.5.3(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)
0.5.2(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)
0.5.1(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)
0.5.0(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)
0.4.33(Mar 10, 2022)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention. It's all we need

GitHub Repository

A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.

faceswap-GAN Adding Adversarial loss and perceptual loss (VGGface) to deepfakes'(reddit user) auto-encoder architecture. Updates Date Update 2018-08-2

3.2k Dec 30, 2022

An implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.

relational-rnn-pytorch An implementation of DeepMind's Relational Recurrent Neural Networks (Santoro et al. 2018) in PyTorch. Relational Memory Core (

241 Nov 18, 2022

PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

Score-Based Generative Modeling through Stochastic Differential Equations This repo contains a PyTorch implementation for the paper Score-Based Genera

757 Jan 04, 2023

This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model inference.

PyTorch Infer Utils This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model infer

11 Mar 20, 2022

Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

DFSA Unofficial pytorch implementation of the ICCV 2021 paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution" (p

2 Nov 15, 2021

A Repository of Community-Driven Natural Instructions

A Repository of Community-Driven Natural Instructions TLDR; this repository maintains a community effort to create a large collection of tasks and the

244 Jan 04, 2023

This repository contains code for the paper "Disentangling Label Distribution for Long-tailed Visual Recognition", published at CVPR' 2021

Disentangling Label Distribution for Long-tailed Visual Recognition (CVPR 2021) Arxiv link Blog post This codebase is built on Causal Norm. Install co

85 Oct 18, 2022

🚀 An end-to-end ML applications using PyTorch, W&B, FastAPI, Docker, Streamlit and Heroku

82 Jun 26, 2022

Multi-Content GAN for Few-Shot Font Style Transfer at CVPR 2018

MC-GAN in PyTorch This is the implementation of the Multi-Content GAN for Few-Shot Font Style Transfer. The code was written by Samaneh Azadi. If you

422 Dec 04, 2022

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

ccks2021-track3 CCKS2021中文NLP地址相关性任务-赛道三-冠军方案团队：我的加菲鱼- wodejiafeiyu 初赛第二/复赛第一/决赛第一前言 19年开始，陆陆续续参加了一些比赛，拿到过一些top，比较懒一直都没分享过，这次比较幸运又拿了top1，打算分享下分类的任务

131 Dec 31, 2022

Gems & Holiday Package Prediction

Predictive_Modelling Gems & Holiday Package Prediction This project is based on 2 cases studies : Gems Price Prediction and Holiday Package prediction

1 Jan 27, 2022

Baseline and template code for node21 detection track

Nodule Detection Algorithm This codebase implements a baseline model, Faster R-CNN, for the nodule detection track in NODE21. It contains all necessar

11 Jan 15, 2022

[NeurIPS 2021] Official implementation of paper "Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization".

Code for Coordinated Policy Optimization Webpage | Code | Paper | Talk (English) | Talk (Chinese) Hi there! This is the source code of the paper “Lear

81 Dec 19, 2022

Multimodal commodity image retrieval 多模态商品图像检索

Multimodal commodity image retrieval 多模态商品图像检索 Not finished yet... introduce explain:The specific description of the project and the product image dat

8 Nov 25, 2022

Inflated i3d network with inception backbone, weights transfered from tensorflow

I3D models transfered from Tensorflow to PyTorch This repo contains several scripts that allow to transfer the weights from the tensorflow implementat

479 Dec 08, 2022

An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

SERank An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow

44 Oct 20, 2022

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

Related tags

Overview

NÜWA - Pytorch (wip)

Citations

Comments

Question about generated videos?

Why the video does not pass through the encoder?

Questions about function forward() in NUWA please.

Type of dataset for training VQ-GAN

Pseudocode for 3DNA?

Releases(0.7.7a)

0.7.7a(Aug 14, 2022)

0.7.7(Aug 14, 2022)

0.7.6(Apr 28, 2022)

0.7.5(Apr 28, 2022)

0.7.4(Apr 27, 2022)

0.7.3(Apr 22, 2022)

0.7.2(Apr 7, 2022)

0.7.1(Mar 24, 2022)

0.7.0(Mar 24, 2022)

0.6.4(Mar 15, 2022)

0.6.3(Mar 15, 2022)

0.6.2(Mar 15, 2022)

0.6.1(Mar 15, 2022)

0.6.0(Mar 15, 2022)

0.5.15(Mar 12, 2022)

0.5.14(Mar 12, 2022)

0.5.12(Mar 12, 2022)

0.5.11(Mar 12, 2022)

0.5.10(Mar 11, 2022)

0.5.9(Mar 11, 2022)

0.5.8(Mar 11, 2022)

0.5.7(Mar 11, 2022)

0.5.6(Mar 11, 2022)

0.5.5(Mar 11, 2022)

0.5.4(Mar 11, 2022)

0.5.3(Mar 10, 2022)

0.5.2(Mar 10, 2022)

0.5.1(Mar 10, 2022)

0.5.0(Mar 10, 2022)

0.4.33(Mar 10, 2022)