This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Last update: Sep 15, 2022

Related tags

Deep Learning CPC_DeepCluster

Overview

CPC_DeepCluster

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

setup instructions

Clone the repo: https://github.com/iiscleap/CPC_DeepCluster.git
Install libraries which would be required for torch-audio https://github.com/pytorch/audio :

Linux: sudo apt-get install sox libsox-dev libsox-fmt-all

conda env create -f environment.yml && conda activate cpc37
Run setup.py python setup.py develop

Using the Repository

To start the training :

python cpc/train_mod.py --pathDB $PATH_AUDIO_FILES --pathCheckpoint $PATH_CHECKPOINT_DIR --LabelsPath $Path_Pseudo_Labels --file_extension $EXTENSION --normMode batchNormn--rnnMode linear --nLevelsGRU 2 --max_size_loaded 1000000000 --save_step 1 --alpha_val $Cluster_Loss_Weighting

Where:

$PATH_AUDIO_FILES is the directory containing the audio files. The files should be arranged as below:

PATH_AUDIO_FILES
│
└───speaker1
│   └───...
│         │   seq_11.{$EXTENSION}
│         │   seq_12.{$EXTENSION}
│         │   ...
│
└───speaker2
    └───...
          │   seq_21.{$EXTENSION}
          │   seq_22.{$EXTENSION}

$PATH_CHECKPOINT_DIR in the directory where the checkpoints will be saved
$EXTENSION is the extension of each audio file
$Path_Pseudo_Labels is the directory that contains the psuedo labels of all the audio files in $PATH_AUDIO_FILES
$Cluster_Loss_Weighting provides the weighting factor for the cluster loss.

Restarting the session

To restart a session from the last save checkpoint run

python cpc/train_mod.py --pathCheckpoint $PATH_CHECKPOINT_DIR

Generating the pseudo labels for training

Create quantized.txt using the repository here

python create_pseudolabels.py --input_file $Path_Containing_quantized.txt --out_path $Output_Dir

$Output_Dir is the directory where .pt files containing pseudo labels

Extracting features, training K Means and Language Models

Extract the features for K means clustering and train K Means clustering, Language models using the repository here

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Related tags

Overview

CPC_DeepCluster

setup instructions

Using the Repository

Restarting the session

Generating the pseudo labels for training

Extracting features, training K Means and Language Models

Owner

LEAP Lab

A PyTorch implementation of " EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks."

Fast (simple) spectral synthesis and emission-line fitting of DESI spectra.

Meshed-Memory Transformer for Image Captioning. CVPR 2020

A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising (CVPR 2020 Oral & TPAMI 2021)

[ICCV 2021] Learning A Single Network for Scale-Arbitrary Super-Resolution

In this project we predict the forest cover type using the cartographic variables in the training/test datasets.

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

On-device speech-to-index engine powered by deep learning.

Active and Sample-Efficient Model Evaluation

the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

Seq2seq - Sequence to Sequence Learning with Keras

Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

A Light CNN for Deep Face Representation with Noisy Labels

Classifying audio using Wavelet transform and deep learning

A PyTorch toolkit for 2D Human Pose Estimation.

Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

Code for the paper "Adversarial Generator-Encoder Networks"

Leveraging Two Types of Global Graph for Sequential Fashion Recommendation, ICMR 2021

AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".