wav2vec_finetune

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

Initial test: gender recognition on this dataset.
Finetune for autism detection
[] Clean up directory
[] Make training and evaluation scripts runnable with cmd line / shell scripts
[] Add random noise on training samples
[] Make baseline models

# make virtual env
pip install -r requirements.txt

mkdir data
mkdir preproc_data
mkdir model
cd data
wget https://zenodo.org/record/1219621/files/CaFE_48k.zip?download=1
unzip the file 

python preproc.py
python train.py
python evaluate.py

Updates

11/9: success! Trained a sex classifier on a small dataset that performs soso. Everything seems to work though.

TODO

Chunk audio files - make predictions in batches of e.g. 5 seconds
Set up benchmark models

Resources:

https://github.com/pytorch/fairseq/blob/master/examples/xlmr/README.md
https://arxiv.org/abs/2006.13979
https://huggingface.co/transformers/training.html
https://huggingface.co/blog/fine-tune-xlsr-wav2vec2
https://discuss.huggingface.co/t/german-asr-fine-tuning-wav2vec2/4558/5
https://huggingface.co/docs/datasets/loading_datasets.html#from-local-files
https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/FINE_TUNE_XLSR_WAV2VEC2.md
https://github.com/m3hrdadfi/soxan
https://www.zhaw.ch/storage/engineering/institute-zentren/cai/BA21_Speech_Classification_Reiser_Fivian.pdf
https://github.com/DReiser7/w2v_did
https://github.com/ARBML/klaam
https://github.com/talhanai/speech-nlp-datasets

Notes:

Look into SpecAugment for finetuning: https://arxiv.org/abs/1904.08779 (on by default)
How to make the prediction?
- Better way than a small feedforward projection? (LSTM or something?)

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

Related tags

Overview

wav2vec_finetune

Updates

TODO

Resources:

Notes:

Owner

Open-source offline translation library written in Python. Uses OpenNMT for translations

Korean Sentence Embedding Repository

Comprehensive-E2E-TTS - PyTorch Implementation

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Problem: Given a nepali news find the category of the news

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

A framework for cleaning Chinese dialog data

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

skweak: A software toolkit for weak supervision applied to NLP tasks

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Named Entity Recognition API used by TEI Publisher

Text Classification in Turkish Texts with Bert

Espial is an engine for automated organization and discovery of personal knowledge

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

A simple version of DeTR