Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Last update: Jan 03, 2023

Overview

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Paper | Demo

Requirements

Python >= 3.6 , Pytorch >= 1.8 and ffmpeg
Set up OpenFace
- We use the OpenFace tools to extract the initial pose of the reference image
- Make sure you have installed this tool, and set the OPENFACE_POSE_EXTRACTOR_PATH in config.py. For example, it should be the absolute path of the "FeatureExtraction.exe" for Windows.
Other requirements are listed in the 'requirements.txt'

Pretrained Checkpoint

Please download the pretrained checkpoint from google-drive and unzip it to the directory (/checkpoints). Or manually modify the settings of GENERATOR_CKPT and AUDIO2POSE_CKPT in the config.py.

Extract phoneme

We employ the CMU phoneset to represent phonemes, the extra 'SIL' means silence. All the phonesets can be seen in 'phindex.json'.

We have extracted the phonemes for the audios in the 'sample/audio' directory. For other audios, you can extract the phonemes by other ASR tools and then map them to the CMU phoneset. Or email to [email protected] for help.

Generate Demo Results

python test_script.py --img_path xxx.jpg --audio_path xxx.wav --phoneme_path xxx.json --save_dir "YOUR_DIR"

Note that the input images must keep the same height and width and the face should be appropriately cropped as in samples/imgs. You can also preprocess your images with image_preprocess.py.

License and Citation

@InProceedings{wang2021one,
author = Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu
title = {One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning},
booktitle = {AAAI 2022},
year = {2022},
}

Acknowledgement

This codebase is based on First Order Motion Model and imaginaire, thanks for their contributions.

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Related tags

Overview

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Paper | Demo

Requirements

Pretrained Checkpoint

Extract phoneme

Generate Demo Results

License and Citation

Acknowledgement

Owner

FuxiVirtualHuman

i3DMM: Deep Implicit 3D Morphable Model of Human Heads

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Face recognize system

Official PyTorch implementation of Spatial Dependency Networks.

Context-Sensitive Misspelling Correction of Clinical Text via Conditional Independence, CHIL 2022

Dynamic View Synthesis from Dynamic Monocular Video

LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

Corgis are the cutest creatures; have 30K of them!

TANL: Structured Prediction as Translation between Augmented Natural Languages

A cool little repl-based simulation written in Python

unofficial pytorch implementation of RefineGAN

make ASCII Art by Deep Learning

The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"

Official code for 'Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning' [ICCV 2021]

A DeepStack custom model for detecting common objects in dark/night images and videos.

A deep learning based semantic search platform that computes similarity scores between provided query and documents

Domain Generalization with MixStyle, ICLR'21.

PyTorch implementation of Asymmetric Siamese (https://arxiv.org/abs/2204.00613)

The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Python-kafka-reset-consumergroup-offset-example - Python Kafka reset consumergroup offset example