A PyTorch Implementation of End-to-End Models for Speech-to-Text

Last update: Dec 25, 2022

Related tags

Overview

speech

Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Connectionist Temporal Classification and the RNN Sequence Transducer are currently supported.

The goal of this software is to facilitate research in end-to-end models for speech recognition. The models are implemented in PyTorch.

The software has only been tested in Python3.6.

We will not be providing backward compatability for Python2.7.

Install

We recommend creating a virtual environment and installing the python requirements there.

virtualenv <path_to_your_env>
source <path_to_your_env>/bin/activate
pip install -r requirements.txt

Then follow the installation instructions for a version of PyTorch which works for your machine.

After all the python requirements are installed, from the top level directory, run:

make

The build process requires CMake as well as Make.

After that, source the setup.sh from the repo root.

source setup.sh

Consider adding this to your bashrc.

You can verify the install was successful by running the tests from the tests directory.

cd tests
pytest

Run

To train a model run

python train.py <path_to_config>

After the model is done training you can evaluate it with

python eval.py <path_to_model> <path_to_data_json>

To see the available options for each script use -h:

python {train, eval}.py -h

Examples

For examples of model configurations and datasets, visit the examples directory. Each example dataset should have instructions and/or scripts for downloading and preparing the data. There should also be one or more model configurations available. The results for each configuration will documented in each examples corresponding README.md.

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Related tags

Overview

speech

Install

Run

Examples

Owner

Awni Hannun

Perform sentiment analysis and keyword extraction on Craigslist listings

Stanford CoreNLP provides a set of natural language analysis tools written in Java

Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

中文問句產生器；使用台達電閱讀理解資料集(DRCD)

用Resnet101+GPT搭建一个玩王者荣耀的AI

문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Weaviate demo with the text2vec-openai module

Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets

Yet Another Compiler Visualizer

Python SDK for working with Voicegain Speech-to-Text

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Code for ACL 2020 paper "Rigid Formats Controlled Text Generation"

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

PyTranslator é simultaneamente um editor e tradutor de texto com diversos recursos e interface feito com coração e 100% em Python

Chinese Pre-Trained Language Models (CPM-LM) Version-I

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

Code for the ACL 2021 paper "Structural Guidance for Transformer Language Models"

a CTF web challenge about making screenshots

ChessCoach is a neural network-based chess engine capable of natural-language commentary.