glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Overview

Glow-Speak

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Installation

git clone https://github.com/rhasspy/glow-speak.git
cd glow-speak/

python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade setuptools wheel
pip3 install -f 'https://synesthesiam.github.io/prebuilt-apps/' -r requirements.txt

python3 setup.py develop
glow-speak --version

Voices

The following languages/voices are supported:

  • German
    • de_thorsten
  • Chinese
    • cmn_jing_li
  • Greek
    • el_rapunzelina
  • English
    • en-us_ljspeech
    • en-us_mary_ann
  • Spanish
    • es_tux
  • Finnish
    • fi_harri_tapani_ylilammi
  • French
    • fr_siwis
  • Hungarian
    • hu_diana_majlinger
  • Italian
    • it_riccardo_fasol
  • Korean
    • ko_kss
  • Dutch
    • nl_rdh
  • Russian
    • ru_nikolaev
  • Swedish
    • sv_talesyntese
  • Swahili
    • sw_biblia_takatifu
  • Vietnamese
    • vi_vais1000

Usage

Download Voices

glow-speak-download de_thorsten

Command-Line Synthesis

glow-speak -v en-us_mary_ann 'This is a test.' --output-file test.wav

HTTP Server

glow-speak-http-server --debug

Visit http://localhost:5002

Socket Server

Start the server:

glow-speak-socket-server --voice en-us_mary_ann --socket /tmp/glow-speak.sock

From a separate terminal:

echo 'This is a test.' | bin/glow-speak-socket-client --socket /tmp/glow-speak.sock | xargs aplay

Lines from client to server are synthesized, and the path to the WAV file is returned (usually in /tmp).

You might also like...
End-to-End Speech Processing Toolkit
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing. To make speech processing available to everyone, we're also releasing example implementation and recipe on some opensource dataset for various tasks (Automatic Speech Recognition, Speech Synthesis, Voice Conversion, Speaker Recognition, etc).

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

🤗 Contributing to OpenSpeech 🤗 OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

 SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation In this repo you can find the code of the Supervised Hybrid Audio Segmentatio

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

Comments
  • AssertionError on web interface (only) - and Raspberry Pi Bullseye test

    AssertionError on web interface (only) - and Raspberry Pi Bullseye test

    Hi Micheal,

    great work again! :smiley:

    I just saw this repository and thought I'd give it a try on my freshly installed Raspberry Pi 4 with 32bit Raspberry Pi OS Bullseye (Debian 11). Installation almost finished without errors! :partying_face: ... I just had to fix one thing: sudo apt-get install libatlas-base-dev After 15min I was already generating audio :grin: :+1:

    When I tested en mary_ann and thorsten_de via the web interface I got this error as soon as my test sentence ended with a question mark:

    DEBUG:glow-speak:ɪ_z ð_ɪ_s ɐ_n_ˈʌ_ð_ɚ t_ˈɛ_s_t? .
    ERROR:glow_speak.http_server:
    Traceback (most recent call last):
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
        result = await self.dispatch_request(request_context)
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
        return await self.ensure_async(handler)(**request_.view_args)
      File "/home/pi/glow-speak/glow_speak/http_server.py", line 484, in app_say
        wav_bytes = await text_to_wav(text, voice, **tts_args)
      File "/home/pi/glow-speak/glow_speak/http_server.py", line 323, in text_to_wav
        text_ids = text_to_ids(
      File "/home/pi/glow-speak/glow_speak/__init__.py", line 110, in text_to_ids
        text_ids = phonemes2ids(
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/phonemes2ids/__init__.py", line 190, in phonemes2ids
        maybe_extend_ids(sub_phoneme, word_ids, append_list=False)
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/phonemes2ids/__init__.py", line 108, in maybe_extend_ids
        maybe_ids = missing_func(phoneme)
      File "/home/pi/glow-speak/glow_speak/__init__.py", line 59, in guess_ids
        typing.List[Phoneme], guess_phonemes(phoneme, self.to_phonemes)
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/gruut_ipa/accent.py", line 159, in guess_phonemes
        assert dist_split is not None
    AssertionError
    

    Maybe some encoding error when reading the web input?

    Speed seems pretty good, comparable to Larynx I'd say :+1: and I noticed the pronunciations have been improved for German :clap: :sunglasses:

    opened by fquirin 0
Owner
Rhasspy
Offline voice assistant
Rhasspy
A Practitioner's Guide to Natural Language Processing

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, Text

Dipanjan (DJ) Sarkar 1.5k Jan 03, 2023
Finally, some decent sample sentences

tts-dataset-prompts This repository aims to be a decent set of sentences for people looking to clone their own voices (e.g. using Tacotron 2). Each se

hecko 19 Dec 13, 2022
Precision Medicine Knowledge Graph (PrimeKG)

PrimeKG Website | bioRxiv Paper | Harvard Dataverse Precision Medicine Knowledge Graph (PrimeKG) presents a holistic view of diseases. PrimeKG integra

Machine Learning for Medicine and Science @ Harvard 103 Dec 10, 2022
Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources Description This is the repository for the paper Unifying Cross-

Sapienza NLP group 16 Sep 09, 2022
CredData is a set of files including credentials in open source projects

CredData is a set of files including credentials in open source projects. CredData includes suspicious lines with manual review results and more information such as credential types for each suspicio

Samsung 19 Sep 07, 2022
This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

Raihan Ahmed 1 Dec 09, 2021
StarGAN - Official PyTorch Implementation

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

Yunjey Choi 5.1k Dec 30, 2022
simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

Quickly train T5 models in just 3 lines of code + ONNX support simpleT5 is built on top of PyTorch-lightning ⚡️ and Transformers 🤗 that lets you quic

Shivanand Roy 220 Dec 30, 2022
TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Alexa 98 Dec 09, 2022
Simple bots or Simbots is a library designed to create simple bots using the power of python. This library utilises Intent, Entity, Relation and Context model to create bots .

Simple bots or Simbots is a library designed to create simple chat bots using the power of python. This library utilises Intent, Entity, Relation and

14 Dec 15, 2021
Contains descriptions and code of the mini-projects developed in various programming languages

TexttoSpeechAndLanguageTranslator-project introduction A pleasant application where the client will be given buttons like play,reset and exit. The cli

Adarsh Reddy 1 Dec 22, 2021
Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

Koichi Yasuoka 3 Dec 22, 2021
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

44 Dec 31, 2022
Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Implementation of some unbalanced loss for NLP task like focal_loss, dice_loss, DSC Loss, GHM Loss et.al Summary Here is a loss implementation reposit

121 Jan 01, 2023
NeMo: a toolkit for conversational AI

NVIDIA NeMo Introduction NeMo is a toolkit for creating Conversational AI applications. NeMo product page. Introductory video. The toolkit comes with

NVIDIA Corporation 5.3k Jan 04, 2023
NLP Overview

NLP-Overview Introduction The field of NPL encompasses a variety of topics which involve the computational processing and understanding of human langu

PeterPham 1 Jan 13, 2022
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

UNITER: UNiversal Image-TExt Representation Learning This is the official repository of UNITER (ECCV 2020). This repository currently supports finetun

Yen-Chun Chen 680 Dec 24, 2022
MASS: Masked Sequence to Sequence Pre-training for Language Generation

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Microsoft 1.1k Dec 17, 2022
Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

anlp21 Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dba

David Bamman 48 Dec 06, 2022
Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

Sentiment Analysis Project This project contains two sentiment analysis programs for Hotel Reviews using a Hotel Reviews dataset from Datafiniti. The

Simran Farrukh 0 Mar 28, 2022