Ukrainian TTS (text-to-speech) using Coqui TTS

Overview
title emoji colorFrom colorTo sdk app_file pinned
Ukrainian TTS
🐸
green
green
gradio
app.py
false

Ukrainian TTS πŸ“’ πŸ€–

Ukrainian TTS (text-to-speech) using Coqui TTS.

Trained on M-AILABS Ukrainian dataset using sumska voice.

Link to online demo -> https://huggingface.co/spaces/robinhad/ukrainian-tts

Support

If you like my work, please support -> SUPPORT LINK

Example

test.mp4

How to use :

  1. pip install -r requirements.txt.
  2. Download model from "Releases" tab.
  3. Launch as one-time command:
tts --text "Text for TTS" \
    --model_path path/to/model.pth.tar \
    --config_path path/to/config.json \
    --out_path folder/to/save/output.wav

or alternatively launch web server using:

tts-server --model_path path/to/model.pth.tar \
    --config_path path/to/config.json

How to train:

  1. Refer to "Nervous beginner guide" in Coqui TTS docs.
  2. Instead of provided config.json use one from this repo.

Attribution

Code for app.py taken from https://huggingface.co/spaces/julien-c/coqui

Comments
  • Error with file: speakers.pth

    Error with file: speakers.pth

    FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Soft/Python/mamba1/TTS/vits_mykyta_latest-September-12-2022_12+38AM-829e2c24/speakers.pth'

    opened by akirsoft 4
  • doc: fix examples in README

    doc: fix examples in README

    Problem

    The one-time snippet does not work as is and complains that the speaker is not defined

     > initialization of speaker-embedding layers.
     > Text: ΠŸΠ΅Ρ€Π΅Π²Ρ–Ρ€ΠΊΠ° ΠΌΡ–ΠΊΡ€ΠΎΡ„ΠΎΠ½Π°
     > Text splitted to sentences.
    ['ΠŸΠ΅Ρ€Π΅Π²Ρ–Ρ€ΠΊΠ° ΠΌΡ–ΠΊΡ€ΠΎΡ„ΠΎΠ½Π°']
    Traceback (most recent call last):
      File "/home/serg/.local/bin/tts", line 8, in <module>
        sys.exit(main())
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/bin/synthesize.py", line 350, in main
        wav = synthesizer.tts(
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/utils/synthesizer.py", line 228, in tts
        raise ValueError(
    ValueError:  [!] Look like you use a multi-speaker model. You need to define either a `speaker_name` or a `speaker_wav` to use a multi-speaker model.
    

    Also it speakers.pth should be downloaded.

    Fix

    Just a few documentation changes:

    • make instructions on what to download from Releases more precise
    • add --speaker_id argument with one of the speakers
    opened by seriar 2
  • One vowel words in the end of the sentence aren't stressed

    One vowel words in the end of the sentence aren't stressed

    Input:

    
    Π‘ΠΎΠ±Π΅Ρ€ Π½Π° Π±Π΅Ρ€Π΅Π·Ρ– Π· бобрСнятами Π±ΡƒΠ±Π»ΠΈΠΊΠΈ ΠΏΡ–ΠΊ.
    
    Π‘ΠΎΡ€ΠΎΠ½ΠΈΠ»Π° Π±ΠΎΡ€ΠΎΠ½Π° ΠΏΠΎ Π±ΠΎΡ€ΠΎΠ½ΠΎΠ²Π°Π½ΠΎΠΌΡƒ полю.
    
    Π†ΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΡ–ΠΏ, ΠΊΠΈΠΏΡ–Π² ΠΎΠΊΡ€Ρ–ΠΏ, ΠΏΡ€ΠΈΠΉΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΡ–ΠΏ - ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€Ρ–ΠΏ, як ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΡ–, Ρ‚Π°ΠΊ Ρ– ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΡ– Ρ– ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠ΅Π½ΡΡ‚Π°Ρ….
    
    Π‘ΠΈΠ΄ΠΈΡ‚ΡŒ ΠŸΡ€ΠΎΠΊΠΎΠΏ β€” ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ, ΠŸΡ–ΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΠΎΠΏ β€” ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ. Π―ΠΊ ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠΎΠ²Ρ– ΠΊΠΈΠΏΡ–Π² ΠΎΠΊΡ€ΠΎΠΏ, Π’Π°ΠΊ Ρ– Π±Π΅Π· ΠŸΡ€ΠΎΠΊΠΎΠΏΠ° ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ.
    

    Result:

    
    Π‘ΠΎΠ±+Π΅Ρ€ Π½+Π° Π±Π΅Ρ€Π΅Π·Ρ– Π· Π±ΠΎΠ±Ρ€Π΅Π½+ятами Π±+ΡƒΠ±Π»ΠΈΠΊΠΈ ΠΏΡ–ΠΊ.
    
    Π‘ΠΎΡ€ΠΎΠ½+ΠΈΠ»Π° Π±ΠΎΡ€ΠΎΠ½+Π° ΠΏ+ΠΎ Π±ΠΎΡ€ΠΎΠ½+ΠΎΠ²Π°Π½ΠΎΠΌΡƒ ΠΏ+олю.
    
    Π†Ρˆ+ΠΎΠ² ΠŸΡ€+ΠΎΠΊΡ–ΠΏ, ΠΊΠΈΠΏ+Ρ–Π² ΠΎΠΊΡ€+Ρ–ΠΏ, ΠΏΡ€ΠΈΠΉΡˆ+ΠΎΠ² ΠŸΡ€+ΠΎΠΊΡ–ΠΏ - ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€+Ρ–ΠΏ, +як ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΡ–, Ρ‚+Π°ΠΊ +Ρ– ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΡ– +Ρ– ΠΏΡ€+ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠ΅Π½ΡΡ‚Π°Ρ….
    
    Π‘ΠΈΠ΄+ΠΈΡ‚ΡŒ ΠŸΡ€ΠΎΠΊ+ΠΎΠΏ β€” ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ, ΠŸΡ–Ρˆ+ΠΎΠ² ΠŸΡ€ΠΎΠΊ+ΠΎΠΏ β€” ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ. +Π―ΠΊ ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΠΎΠ²Ρ– ΠΊΠΈΠΏ+Ρ–Π² ΠΎΠΊΡ€ΠΎΠΏ, Π’+Π°ΠΊ +Ρ– Π±+Π΅Π· ΠŸΡ€+ΠΎΠΊΠΎΠΏΠ° ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ.```
    opened by robinhad 0
  • Error import StressOption

    Error import StressOption

    Traceback (most recent call last): File "/home/user/Soft/Python/mamba1/test.py", line 1, in from ukrainian_tts.tts import TTS, Voices, StressOption ImportError: cannot import name 'StressOption' from 'ukrainian_tts.tts'

    opened by akirsoft 0
  • Vits improvements

    Vits improvements

    vitsArgs = VitsArgs(
        # hifi V3
        resblock_type_decoder = '2',
        upsample_rates_decoder = [8,8,4],
        upsample_kernel_sizes_decoder = [16,16,8],
        upsample_initial_channel_decoder = 256,
        resblock_kernel_sizes_decoder = [3,5,7],
        resblock_dilation_sizes_decoder = [[1,2], [2,6], [3,12]],
    )
    
    opened by robinhad 0
  • Model improvement checklist

    Model improvement checklist

    • [x] Add Ukrainian accentor - https://github.com/egorsmkv/ukrainian-accentor
    • [ ] Fine-tune from existing checkpoint (e.g. VITS Ljspeech)
    • [ ] Try to increase fft_size, hop_length to match sample_rate accordingly
    • [ ] Include more dataset samples into model
    opened by robinhad 0
Releases(v4.0.0)
  • v4.0.0(Dec 10, 2022)

  • v3.0.0(Sep 14, 2022)

    This is a release of Ukrainian TTS model and checkpoint. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 280 000 steps by @robinhad . Kudos to @egorsmkv for providing dataset for this model. Kudos to @proger for providing alignment scripts. Kudos to @dchaplinsky for Dmytro voice.

    Example:

    Test sentence:

    К+Π°ΠΌ'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΈΠΉ - ΠΌ+істо Π² Π₯мСльн+ΠΈΡ†ΡŒΠΊΡ–ΠΉ +області Π£ΠΊΡ€Π°+Ρ—Π½ΠΈ, Ρ†+Π΅Π½Ρ‚Ρ€ Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΡ— ΠΌΡ–ΡΡŒΠΊ+ΠΎΡ— ΠΎΠ±'+Ρ”Π΄Π½Π°Π½ΠΎΡ— Ρ‚Π΅Ρ€ΠΈΡ‚ΠΎΡ€Ρ–+Π°Π»ΡŒΠ½ΠΎΡ— Π³Ρ€ΠΎΠΌ+Π°Π΄ΠΈ +Ρ– Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΠ³ΠΎ Ρ€Π°ΠΉ+ΠΎΠ½Ρƒ.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/190852232-34956a1d-77a9-42b9-b96d-39d0091e3e34.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/190852238-366782c1-9472-45fc-8fea-31346242f927.mp4

    Dmytro (male):

    https://user-images.githubusercontent.com/5759207/190852251-db105567-52ba-47b5-8ec6-5053c3baac8c.mp4

    Olha (female):

    https://user-images.githubusercontent.com/5759207/190852259-c6746172-05c4-4918-8286-a459c654eef1.mp4

    Lada (female):

    https://user-images.githubusercontent.com/5759207/190852270-7aed2db9-dc08-4a9f-8775-07b745657ca1.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(12.07 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.97 MB)
    speakers.pth(495 bytes)
  • v2.0.0(Jul 10, 2022)

    This is a release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 140 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta and Olena dataset.

    Example:

    Test sentence:

    К+Π°ΠΌ'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΈΠΉ - ΠΌ+істо Π² Π₯мСльн+ΠΈΡ†ΡŒΠΊΡ–ΠΉ +області Π£ΠΊΡ€Π°+Ρ—Π½ΠΈ, Ρ†+Π΅Π½Ρ‚Ρ€ Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΡ— ΠΌΡ–ΡΡŒΠΊ+ΠΎΡ— ΠΎΠ±'+Ρ”Π΄Π½Π°Π½ΠΎΡ— Ρ‚Π΅Ρ€ΠΈΡ‚ΠΎΡ€Ρ–+Π°Π»ΡŒΠ½ΠΎΡ— Π³Ρ€ΠΎΠΌ+Π°Π΄ΠΈ +Ρ– Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΠ³ΠΎ Ρ€Π°ΠΉ+ΠΎΠ½Ρƒ.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/178158485-29a5d496-7eeb-4938-8ea7-c345bc9fed57.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/178158492-8504080e-2f13-43f1-83f0-489b1f9cd66b.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(9.97 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.72 MB)
    optimized.pth(329.95 MB)
    speakers.pth(431 bytes)
  • v2.0.0-beta(May 8, 2022)

    This is a beta release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 150 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta dataset.

    Example:

    https://user-images.githubusercontent.com/5759207/167305810-2b023da7-0657-44ac-961f-5abf1aa6ea7d.mp4

    :

    Source code(tar.gz)
    Source code(zip)
    config.json(8.85 KB)
    LICENSE(34.32 KB)
    model-inference.pth(317.15 MB)
    model.pth(951.32 MB)
    tts_output.wav(1.11 MB)
  • v1.0.0(Jan 14, 2022)

  • v0.0.1(Oct 14, 2021)

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 01, 2022
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

DELTA 1.5k Dec 26, 2022
Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

smaller-LaBSE LaBSE(Language-agnostic BERT Sentence Embedding) is a very good method to get sentence embeddings across languages. But it is hard to fi

Jeong Ukjae 13 Sep 02, 2022
A python gui program to generate reddit text to speech videos from the id of any post.

Reddit text to speech generator A python gui program to generate reddit text to speech videos from the id of any post. Current functionality Generate

Aadvik 17 Dec 19, 2022
Multi Task Vision and Language

12-in-1: Multi-Task Vision and Language Representation Learning Please cite the following if you use this code. Code and pre-trained models for 12-in-

Meta Research 711 Jan 08, 2023
This is a really simple text-to-speech app made with python and tkinter.

Tkinter Text-to-Speech App by Souvik Roy This is a really simple tkinter app which converts the text you have entered into a speech. It is created wit

Souvik Roy 1 Dec 21, 2021
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 902 Jan 06, 2023
Shirt Bot is a discord bot which uses GPT-3 to generate text

SHIRT BOT Β· Shirt Bot is a discord bot which uses GPT-3 to generate text. Made by Cyclcrclicly#3420 (474183744685604865) on Discord. Support Server EX

31 Oct 31, 2022
This repository describes our reproducible framework for assessing self-supervised representation learning from speech

LeBenchmark: a reproducible framework for assessing SSL from speech Self-Supervised Learning (SSL) using huge unlabeled data has been successfully exp

49 Aug 24, 2022
All the code I wrote for Overwatch-related projects that I still own the rights to.

overwatch_shit.zip This is (eventually) going to contain all the software I wrote during my five-year imprisonment stay playing Overwatch. I'll be add

zkxjzmswkwl 2 Dec 31, 2021
American Sign Language (ASL) to Text Converter

Signterpreter American Sign Language (ASL) to Text Converter Recommendations Although there is grayscale and gaussian blur, we recommend that you use

0 Feb 20, 2022
β˜€οΈ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

Accuracy of BBC Weather forecasts for Honolulu This repository records the forecasts made by BBC Weather for the city of Honolulu, USA. Essentially, t

Max Halford 12 Oct 15, 2022
Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

ITTR - Pytorch Implementation of the Hybrid Perception Block (HPB) and Dual-Pruned Self-Attention (DPSA) block from the ITTR paper for Image to Image

Phil Wang 17 Dec 23, 2022
NLP Overview

NLP-Overview Introduction The field of NPL encompasses a variety of topics which involve the computational processing and understanding of human langu

PeterPham 1 Jan 13, 2022
Shared, streaming Python dict

UltraDict Sychronized, streaming Python dictionary that uses shared memory as a backend Warning: This is an early hack. There are only few unit tests

Ronny Rentner 192 Dec 23, 2022
This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

FALLABOUT-SRMMIC 21 POETRY-GENERATION HINGLISH DESCRIPTION We have developed a NLP(natural language processing) model which automatically generates a

7 Sep 28, 2021
precise iris segmentation

PI-DECODER Introduction PI-DECODER, a decoder structure designed for Precise Iris Segmentation and Location. The decoder structure is shown below: Ple

8 Aug 08, 2022
βš–οΈ A Statutory Article Retrieval Dataset in French.

A Statutory Article Retrieval Dataset in French This repository contains the Belgian Statutory Article Retrieval Dataset (BSARD), as well as the code

Maastricht Law & Tech Lab 19 Nov 17, 2022
Deep learning for NLP crash course at ABBYY.

Deep NLP Course at ABBYY Deep learning for NLP crash course at ABBYY. Suggested textbook: Neural Network Methods in Natural Language Processing by Yoa

Dan Anastasyev 597 Dec 18, 2022