Python codes for Lite Audio-Visual Speech Enhancement.

Last update: Dec 01, 2022

Related tags

Deep Learning LAVSE

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE).

We have also put some preprocessed sample data (including enhanced results) in this repository.

The dataset of TMSV (Taiwan Mandarin speech with video) used in LAVSE is released here.

Please cite the following paper if you find the codes useful in your research.

@inproceedings{chuang2020lite,
  title={Lite Audio-Visual Speech Enhancement},
  author={Chuang, Shang-Yi and Tsao, Yu and Lo, Chen-Chou and Wang, Hsin-Min},
  booktitle={Proc. Interspeech 2020}
}

Prerequisites

Ubuntu 18.04
Python 3.6
CUDA 10

You can use pip to install Python depedencies.

pip install -r requirements.txt

Usage

You can simply enter the command below and the average PESQ and STOI results will show on your terminal pane.

Remember to activate visdom (probably in a screen or tmux) for recording the training loss before bashing the script.

bash run.sh

Go check run.sh if you need further information about the command lines.

License

The LAVSE work is released under MIT License.

See LICENSE for more details.

Acknowledgments

Bio-ASP Lab, CITI, Academia Sinica, Taipei, Taiwan
SLAM Lab, IIS, Academia Sinica, Taipei, Taiwan

Python codes for Lite Audio-Visual Speech Enhancement.

Related tags

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

Prerequisites

Usage

License

Acknowledgments

Owner

Shang-Yi Chuang

Code for KHGT model, AAAI2021

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.

The open-source and free to use Python package miseval was developed to establish a standardized medical image segmentation evaluation procedure

[CVPR 2021] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

A curated list of Generative Deep Art projects, tools, artworks, and models

Code for the paper Hybrid Spectrogram and Waveform Source Separation

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

Highly comparative time-series analysis

Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

Implementation of the paper "Generating Symbolic Reasoning Problems with Transformer GANs"

Facial Expression Detection In The Realtime

A cool little repl-based simulation written in Python

PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

MapReader: A computer vision pipeline for the semantic exploration of maps at scale

Very deep VAEs in JAX/Flax

Real-time 3D multi-person detection made easy with OpenPose and the ZED

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Behavioral "black-box" testing for recommender systems

Software associated to AAAI paper "Planning with Biological Neurons and Synapses"

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

Python codes for Lite Audio-Visual Speech Enhancement.

Related tags

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

Prerequisites

Usage

License

Acknowledgments

Owner

Shang-Yi Chuang

Code for KHGT model, AAAI2021

The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

The open-source and free to use Python package miseval was developed to establish a standardized medical image segmentation evaluation procedure

[CVPR 2021] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

A curated list of Generative Deep Art projects, tools, artworks, and models

Code for the paper Hybrid Spectrogram and Waveform Source Separation

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

Highly comparative time-series analysis

Allele-specific pipeline for unbiased read mapping(WIP), QTL discovery(WIP), and allelic-imbalance analysis

Implementation of the paper "Generating Symbolic Reasoning Problems with Transformer GANs"

Facial Expression Detection In The Realtime

A cool little repl-based simulation written in Python

PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

MapReader: A computer vision pipeline for the semantic exploration of maps at scale

Very deep VAEs in JAX/Flax

Real-time 3D multi-person detection made easy with OpenPose and the ZED

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Behavioral "black-box" testing for recommender systems

Software associated to AAAI paper "Planning with Biological Neurons and Synapses"

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer.