Romanian Automatic Speech Recognition from the ROBIN project

Last update: Jan 01, 2023

Overview

RobinASR

This repository contains Robin's Automatic Speech Recognition (RobinASR) for the Romanian language based on the DeepSpeech2 architecture, together with a KenLM language model to imporve the transcriptions.

The pretrained text-to-speech model can be downloaded from here and the pretrained KenLM can be downloaded from here.

Also, make sure to visit:

A demo of the ASR system available in the RELATE platform: https://relate.racai.ro/index.php?path=robin/asr
A post-processing web service allowing hyphenation and basic capitalization restoration: https://github.com/racai-ai/RobinASRHyphenationCorrection

Installation

Docker

Download the pretrained text-to-speech model and the pretrained KenLM at the above links, and copy them in a models directory inside this repository.
Build the docker image using the Dockerfile. Make sure that deepspeech_pytorch/configs/inference_config.py has the desired configuration.

docker build --tag RobinASR .

Run the docker image.

docker run --gpus all -p 8888:8888 --net=host --ipc=host RobinASR

From Source

You must have Python 3.6+ and PyTorch 1.5.1+ installed in your system. Also. Cuda 10.1+ is required if you want to use the (recommended) GPU version.
Clone the repository and install its dependencies:

git clone https://github.com/racai-ai/RobinASR.git
cd RobinASR
pip3 install -r requirements.txt
pip3 install -e .

Install Nvidia Apex:

git clone --recursive https://github.com/NVIDIA/apex.git
cd apex && pip install .

If you want to use Beam Search and the KenLM language model, you must install CTCDecode:

git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install .

Inference Server

Firstly, take a look at the configuration file in deepspeech_pytorch/configs/inference_config.py and make sure that the configuration meets your requirements. Then, run the following command:

python3 server.py

Train a New Model

You must create 3 csv manifest files (train, valid and test) that contain on each line the the path to a wav file and the path to its corresponding transcription, separated by commas:

path_to_wav1,path_to_txt1
path_to_wav2,path_to_txt2
path_to_wav3,path_to_txt3
...

Then you must modify correspondingly with your configuration the file located at deepspeech_pytorch/configs/train_config.py and start training with:

python train.py

Acknowledgments

We would like to thank Sean Narnen for making his DeepSpeech2 implementation publicly-available. We used a lot of his code in our implementation.

Cite

If you are using this repository, please cite the following paper as a thank you to the authors:

Avram, A.M., Păiș, V. and Tufis, D., 2020, October. Towards a Romanian end-to-end automatic speech recognition based on Deepspeech2. In Proc. Rom. Acad. Ser. A (Vol. 21, pp. 395-402).

or in BibTeX format:

@inproceedings{avram2020towards,
  title={Towards a Romanian end-to-end automatic speech recognition based on Deepspeech2},
  author={Avram, Andrei-Marius and Păiș, Vasile and Tufiș, Dan},
  booktitle={Proceedings of the Romanian Academy, Series A},
  pages={395--402},
  year={2020}
}

Romanian Automatic Speech Recognition from the ROBIN project

Related tags

Overview

RobinASR

Installation

Docker

From Source

Inference Server

Train a New Model

Acknowledgments

Cite

Owner

RACAI

Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

Multi-Anchor Active Domain Adaptation for Semantic Segmentation (ICCV 2021 Oral)

Generative Art Using Neural Visual Grammars and Dual Encoders

Extracting knowledge graphs from language models as a diagnostic benchmark of model performance.

Proximal Backpropagation - a neural network training algorithm that takes implicit instead of explicit gradient steps

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

NovelD: A Simple yet Effective Exploration Criterion

TensorFlow implementation of Adaptive Information Transfer Multi-task (AITM) framework. Code for the paper submitted to KDD21: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning for Customer Acquisition.

Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5)

Action Recognition for Self-Driving Cars

Scrutinizing XAI with linear ground-truth data

Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation".

Numenta published papers code and data

Tools for manipulating UVs in the Blender viewport.

PyTorch implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

Learning Visual Words for Weakly-Supervised Semantic Segmentation

CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"