neural network based speaker embedder

Last update: Dec 29, 2022

Overview

Content

What is deepaudio-speaker?

Deepaudio-speaker is a framework for training neural network based speaker embedders. It supports online audio augmentation thanks to torch-audiomentation. It inlcudes or will include popular neural network architectures and losses used for speaker embedder.

To make it easy to use various functions such as mixed-precision, multi-node training, and TPU training etc, I introduced PyTorch-Lighting and Hydra in this framework (just like what pyannote-audio and openspeech do).

Deepaudio-tts is coming soon.

Installation

conda create -n deepaudio python=3.8.5
conda activate deepaudio
conda install numpy cffi
conda install libsndfile=1.0.28 -c conda-forge
git clone https://github.com/deepaudio/deepaudio-speaker.git
cd deepaudio-speaker
pip install -e .

Get Started

Supported Datasets

####Voxceleb2

Download VoxCeleb dataset and follow this script to obtain this kind of directory structure:

/path/to/voxceleb/voxceleb1/dev/wav/id10001/1zcIwhmdeo4/00001.wav
/path/to/voxceleb/voxceleb1/test/wav/id10270/5r0dWxy17C8/00001.wav
/path/to/voxceleb/voxceleb2/dev/aac/id00012/21Uxsk56VDQ/00001.m4a
/path/to/voxceleb/voxceleb2/test/aac/id00017/01dfn2spqyE/00001.m4a

Training examples

Example1: Train the ecapa-tdnn model with fbank features on GPU.

$ deepaudio-speaker-train  \
    dataset=voxceleb2 \
    dataset.dataset_path=/your/path/to/voxceleb2/dev/wav/ \
    model=ecapa \
    model.channels=1024 \
    feature=fbank \
    lr_scheduler=warmup_reduce_lr_on_plateau \
    trainer=gpu \
    criterion=aamsoftmax

Example2: Extract speaker embedding with trained model.

Todo

Model Architecture

ECAPA-TDNN This is an unofficial implementation from @lawlict. Please find more details in this link.

ECAPA-TDNN This is implemented by @joonson. Please find more details in this link.

ResNetSE34L This is borrowed from voxceleb trainer.

ResNetSE34V2 This is borrowed from voxceleb trainer.

resnet101 This is proposed by BUT for speaker diarization. Please note that the feature used in this framework is different from VB-HMM

How to contribute to deepaudio-speaker

It is a personal project. So I don't have enough gpu resources to do a lot of experiments. I appreciate any kind of feedback or contributions. Please feel free to make a pull requsest for some small issues like bug fixes, experiment results. If you have any questions, please open an issue.

Acknowledge

I borrow a lot of codes from openspeech and pyannote-audio

neural network based speaker embedder

Related tags

Overview

Content

What is deepaudio-speaker?

Installation

Get Started

Supported Datasets

Training examples

Model Architecture

How to contribute to deepaudio-speaker

Acknowledge

Owner

Journalism AI – Quotes extraction for modular journalism

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Paddle2.x version AI-Writer

ProtFeat is protein feature extraction tool that utilizes POSSUM and iFeature.

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

A sample project that exists for PyPUG's "Tutorial on Packaging and Distributing Projects"

LeBenchmark: a reproducible framework for assessing SSL from speech

Code for "Generative adversarial networks for reconstructing natural images from brain activity".

ACL'22: Structured Pruning Learns Compact and Accurate Models

:P Some basic stuff I'm gonna use for my upcoming Agile Software Development and Devops

Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.