StarGAN-ZSVC: Unofficial PyTorch Implementation

Overview

StarGAN-ZSVC: Unofficial PyTorch Implementation

This repository is an unofficial PyTorch implementation of StarGAN-ZSVC by Matthew Baas and Herman Kamper. This repository provides both model architectures and the code to inference or train them.

One of the StarGAN-ZSVC advantages is that it works on zero-shot settings and can be trained on unparallel audio data (different audio content by different speakers). Also, the model inference time is real-time or faster.

Disclaimer: I implement this repository for educational purpose only. All credits go to the original authors. Also, it may contains different details as described in the paper. If there is a room for improvement, please feel free to contact me.

Set up

git clone [email protected]:Top34051/stargan-zsvc.git
cd stargan-zsvc
conda env create -f environment.yml
conda activate stargan-zsvc

Usage

Voice conversion

Given two audio files, source.wav and target.wav, you can generate a new audio file with the same speaking content as in source.wav spoken by the speaker in target.wav as follow.

First, load my pretrained model weights (best.pt) and put it in checkpoints folder.

Next, we need to embed both speaker identity.

python embed.py --path path_to_source.wav --name src
python embed.py --path path_to_target.wav --name trg

This will generate src.npy and trg.npy, the source and target speaker embeddings.

To perform voice conversion,

python convert.py \
  --audio_path path_to_source.wav \
  --src_id src \
  --trg_id trg  

That's it! πŸŽ‰ You can check out the result at results/output.wav.

Training

To train the model, you have to download and preprocess the dataset first. Since your data might be different from mine, I recommend you to read and fix the logic I used in preprocess.py (the dataset I used is here).

The fixed size utterances from each speaker will be extracted, resampled to 22,050 Hz, and converted to Mel-spectrogram with window and hop length of size 1024 and 256. This will preprocess the speaker embeddings as well, so that you don't have to embed them one-by-one.

The processed dataset will look like this

data/
    train/
        spk1.npy # contains N samples of (80, 128) mel-spectrogram
        spk2.npy
        ...
    test/
        spk1.npy
        spk2.npy
        ...
        
embeddings/
    spk1.npy # a (256, ) speaker embedding vector
    spk2.npy
    ...

You can customize some of the training hyperparameters or select resuming checkpoint in config.json. Finally, train the models by

python main.py \ 
  --config_file config.json 
  --num_epoch 3000

You will now see new checkpoint pops up in the checkpoints folder.

Please check out my code and modify them for improvement. Have fun training! ✌️

Owner
Jirayu Burapacheep
Deep learning enthusiast; Undergrad in Computer and Data Science at UW-Madison
Jirayu Burapacheep
UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus General info This is

71 Oct 25, 2022
BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Pre-trained checkpoint and bert config json file Location of checkpoint and bert config json file This MLCommons members Google Drive location contain

SAIT (Samsung Advanced Institute of Technology) 12 Apr 27, 2022
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task β€”β€” Next Sentence Prediction"

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task β€”β€” Next Sentence Prediction"

Sun Yi 201 Nov 21, 2022
MCMC samplers for Bayesian estimation in Python, including Metropolis-Hastings, NUTS, and Slice

Sampyl May 29, 2018: version 0.3 Sampyl is a package for sampling from probability distributions using MCMC methods. Similar to PyMC3 using theano to

Mat Leonard 304 Dec 25, 2022
This repository contains small projects related to Neural Networks and Deep Learning in general.

ILearnDeepLearning.py Description People say that nothing develops and teaches you like getting your hands dirty. This repository contains small proje

Piotr Skalski 1.2k Dec 22, 2022
Official implementation of the paper 'High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network' in CVPR 2021

LPTN Paper | Supplementary Material | Poster High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network Ji

372 Dec 26, 2022
Pseudo-Visual Speech Denoising

Pseudo-Visual Speech Denoising This code is for our paper titled: Visual Speech Enhancement Without A Real Visual Stream published at WACV 2021. Autho

Sindhu 94 Oct 22, 2022
Transfer Learning Remote Sensing

Transfer_Learning_Remote_Sensing Simulation R codes for data generation and visualizations are in the folder simulation. Experiment: California Housin

2 Jun 21, 2022
Alphabetical Letter Recognition

BayeesNetworks-Image-Classification Alphabetical Letter Recognition In these demo we are using "Bayees Networks" Our database is composed by Learning

Mohammed Firass 4 Nov 30, 2021
A Python Package For System Identification Using NARMAX Models

SysIdentPy is a Python module for System Identification using NARMAX models built on top of numpy and is distributed under the 3-Clause BSD license. N

Wilson Rocha 175 Dec 25, 2022
torchbearer: A model fitting library for PyTorch

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll

632 Dec 13, 2022
Pytorch implementation of FlowNet by Dosovitskiy et al.

FlowNetPytorch Pytorch implementation of FlowNet by Dosovitskiy et al. This repository is a torch implementation of FlowNet, by Alexey Dosovitskiy et

ClΓ©ment Pinard 762 Jan 02, 2023
Rax is a Learning-to-Rank library written in JAX

πŸ¦– Rax: Composable Learning to Rank using JAX Rax is a Learning-to-Rank library written in JAX. Rax provides off-the-shelf implementations of ranking

Google 247 Dec 27, 2022
WRENCH: Weak supeRvision bENCHmark

πŸ”§ What is it? Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development

Jieyu Zhang 176 Dec 28, 2022
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch = 1.2.0 P

16 Dec 14, 2022
Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

TS-CAM: Token Semantic Coupled Attention Map for Weakly SupervisedObject Localization This is the official implementaion of paper TS-CAM: Token Semant

vasgaowei 112 Jan 02, 2023
On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation

On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation On Nonlinear Latent Transformations for GAN-based Image Editi

Valentin Khrulkov 22 Oct 24, 2022
Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

[AAAI2022] Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics Overall pipeline of OCN. Paper Link: [arXiv] [AAAI

13 Nov 21, 2022
Instance-level Image Retrieval using Reranking Transformers

Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev

UVA Computer Vision 87 Jan 03, 2023
A PyTorch Implementation of ViT (Vision Transformer)

ViT - Vision Transformer This is an implementation of ViT - Vision Transformer by Google Research Team through the paper "An Image is Worth 16x16 Word

Quan Nguyen 7 May 11, 2022