Non-Autoregressive Predictive Coding

This repository contains the implementation of Non-Autoregressive Predictive Coding (NPC) as described in the preprint paper submitted to ICASSP 2021.

A quick example for training NPC

python main.py --config config/self_supervised/npc_example.yml \
               --task self-learning

For more complete examples including downstream tasks, please see the example script.
For preparing data, please visit preprocess.
For detailed hyperparameters setting and description, please checkout example config file of NPC.
For all run-time options, use -h flag.
Implementation of Autoregressive Predictive Coding (APC, 2019, Chung et al.) and Vector-Quantized APC (VQ-APC, 2020, Chung et al.) are also available using similar training/downstream execution with example config files here.

Some notes

We found the unmasked feature produced by the last ConvBlock layer a better representation. In the phone classification tasks, switching to the unmasked feature (PER 25.6%) provided a 1.6% improvement over the masked feature (PER 27.2%). Currently, this is not included in the preprint version and will be updated to the paper in the future. Please refer to downstream examples to activate this option.
APC/VQ-APC are implemented with the following modifications for improvement (for the unmodified version, please visit the official implementation of APC / VQAPC)
- Multi-group VQ available for VQ-APC, but with VQ on last layer only
- Using utterance-wised CMVN surface feature（just as NPC did)
- Using Gumbel Softmax from official API of pytorch
See package requirement for toolkits used, tensorboard can be used to access log files in --logdir.

Contact

Feel free to contact me for questions or feedbacks, my email can be found in the paper or my personal page.

Citation

If you find our work and/or this repository helpful, please do consider citing us

@article{liu2020nonautoregressive,
  title   = {Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies},
  author  = {Liu, Alexander and Chung, Yu-An and Glass, James},
  journal = {arXiv preprint arXiv:2011.00406},
  year    = {2020}
}

Non-Autoregressive Predictive Coding

Related tags

Overview

Non-Autoregressive Predictive Coding

Some notes

Contact

Citation

Owner

Alexander H. Liu

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

English loanwords in the world's languages

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

SentAugment is a data augmentation technique for semi-supervised learning in NLP.

Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".

Long text token classification using LongFormer

Simple program that translates the name of files into English

NLP topic mdel LDA - Gathered from New York Times website

Chatbot for the Chatango messaging platform

Code for evaluating Japanese pretrained models provided by NTT Ltd.

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

A PyTorch implementation of the Transformer model in "Attention is All You Need".

Behavioral Testing of Clinical NLP Models

ACL'2021: Learning Dense Representations of Phrases at Scale

Korean Sentence Embedding Repository

Adversarial Examples for Extreme Multilabel Text Classification

Curso práctico: NLP de cero a cien 🤗

Pre-Training with Whole Word Masking for Chinese BERT

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

Chinese Grammatical Error Diagnosis