A minimal Conformer ASR implementation adapted from ESPnet.

Last update: Jan 24, 2022

Related tags

Overview

Conformer ASR

A minimal Conformer ASR implementation adapted from ESPnet.

Introduction

I want to use the pre-trained English ASR model provided by ESPnet. However, ESPnet is relatively heavy for me. So here I try to extract only the conformer ASR part from ESPnet so that I can do better customization. Let's do it.

There are bunch of models available for ASR listed here. I choose the one with name:

kamo-naoyuki/librispeech_asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_bpe5000_scheduler_confwarmup_steps40000_optim_conflr0.0025_sp_valid.acc.ave

Its performance can be found [here](https://zenodo.org/record/4604066#.YbxsX5FByV4), toggle me to see.

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_asr_asr_model_valid.acc.ave/dev_clean	2703	54402	97.9	1.9	0.2	0.2	2.3	28.6
decode_asr_asr_model_valid.acc.ave/dev_other	2864	50948	94.5	5.1	0.5	0.6	6.1	48.3
decode_asr_asr_model_valid.acc.ave/test_clean	2620	52576	97.7	2.1	0.2	0.3	2.6	31.4
decode_asr_asr_model_valid.acc.ave/test_other	2939	52343	94.7	4.9	0.5	0.7	6.0	49.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_clean	2703	54402	98.3	1.5	0.2	0.2	1.9	25.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_other	2864	50948	95.8	3.7	0.4	0.5	4.6	40.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_clean	2620	52576	98.1	1.7	0.2	0.3	2.1	26.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_other	2939	52343	95.8	3.7	0.5	0.5	4.7	42.4

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_asr_asr_model_valid.acc.ave/dev_clean	2703	288456	99.4	0.3	0.2	0.2	0.8	28.6
decode_asr_asr_model_valid.acc.ave/dev_other	2864	265951	98.0	1.2	0.8	0.7	2.7	48.3
decode_asr_asr_model_valid.acc.ave/test_clean	2620	281530	99.4	0.3	0.3	0.3	0.9	31.4
decode_asr_asr_model_valid.acc.ave/test_other	2939	272758	98.2	1.0	0.7	0.7	2.5	49.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_clean	2703	288456	99.5	0.3	0.2	0.2	0.7	25.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_other	2864	265951	98.3	1.0	0.7	0.5	2.2	40.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_clean	2620	281530	99.5	0.3	0.3	0.2	0.7	26.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_other	2939	272758	98.5	0.8	0.7	0.5	2.1	42.4

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_asr_asr_model_valid.acc.ave/dev_clean	2703	68010	97.5	1.9	0.7	0.4	2.9	28.6
decode_asr_asr_model_valid.acc.ave/dev_other	2864	63110	93.4	5.0	1.6	1.0	7.6	48.3
decode_asr_asr_model_valid.acc.ave/test_clean	2620	65818	97.2	2.0	0.8	0.4	3.3	31.4
decode_asr_asr_model_valid.acc.ave/test_other	2939	65101	93.7	4.5	1.8	0.9	7.2	49.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_clean	2703	68010	97.8	1.5	0.7	0.3	2.5	25.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/dev_other	2864	63110	94.6	3.8	1.6	0.7	6.1	40.0
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_clean	2620	65818	97.6	1.6	0.8	0.3	2.7	26.2
decode_asr_lm_lm_train_lm_transformer2_bpe5000_scheduler_confwarmup_steps25000_batch_bins500000000_accum_grad2_use_amptrue_valid.loss.ave_asr_model_valid.acc.ave/test_other	2939	65101	94.7	3.5	1.8	0.7	6.0	42.4

ASR step by step

1. Setup code

pip install .

2. Download the model and unzip it

wget https://zenodo.org/record/4604066/files/asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_bpe5000_scheduler_confwarmup_steps40000_optim_conflr0.0025_sp_valid.acc.ave.zip?download=1 -o conformer.zip
unzip conformer.zip

3. Run an example

import torch
import librosa
from mmds.utils.spectrogram import MelSpectrogram
from conformer_asr import Conformer, Tokenizer

sample_rate = 16000
cfg_path = "./exp_unnorm/asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_unnorm_bpe5000/config.yaml"
bpe_path = "./data/en_unnorm_token_list/bpe_unigram5000/bpe.model"
ckpt_path = "./exp_unnorm/asr_train_asr_conformer6_n_fft512_hop_length256_raw_en_unnorm_bpe5000/valid.acc.ave_10best.pth"

tokenizer = Tokenizer(cfg_path, bpe_path)
conformer = Conformer(tokenizer, ckpt_path=ckpt_path)
conformer.eval()

spec_fn = MelSpectrogram(
    sample_rate,
    hop_length=256,
    f_min=0,
    f_max=8000,
    win_length=512,
    power=2,
)

w0, _ = librosa.load("./example.m4a", sample_rate)
w0 = torch.from_numpy(w0)
m0 = spec_fn(w0).t()

l = len(m0)

# create batch with different length audio (yes, supported)
x = [m0, m0[: l // 2], m0[: l // 4]]

ref = "This is a test video for youtube-dl. For more information, contact [email protected]".lower()
hyps = conformer.decode(x, beam_width=20)

print("REF", ref)
for hyp in hyps:
    print("HYP", hyp.lower())

Results

REF this is a test video for youtube-dl. for more information, contact [email protected]
HYP this is a test video for you do bl for more information -- contact the hih aging at the hihaging, not the
HYP this is a test for you d bl for more information
HYP this is a testim for you to

A minimal Conformer ASR implementation adapted from ESPnet.

Related tags

Overview

Conformer ASR

Introduction

ASR step by step

1. Setup code

2. Download the model and unzip it

3. Run an example

Features

Supported

Not supported yet

Owner

Niu Zhe

A BERT-based reverse-dictionary of Korean proverbs

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

A minimal code for fairseq vq-wav2vec model inference.

A Fast Sequence Transducer Implementation with PyTorch Bindings

Image2pcl - Enter the metaverse with 2D image to 3D projections

[ICLR'19] Trellis Networks for Sequence Modeling

End-to-end MLOps pipeline of a BERT model for emotion classification.

Code for the project carried out fulfilling the course requirements for Fall 2021 NLP at NYU

Python library for interactive topic model visualization. Port of the R LDAvis package.

Korean extractive summarization. 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

Weaviate demo with the text2vec-openai module

Data loaders and abstractions for text and NLP

Multi Task Vision and Language

This repository implements a brute-force spellchecker utilizing the Damerau-Levenshtein edit distance.

CoSENT 比Sentence-BERT更有效的句向量方案

An implementation of WaveNet with fast generation

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

Guide to using pre-trained large language models of source code