PyTorch wrappers for using your model in audacity!

Overview

audacitorch

This package contains utilities for prepping PyTorch audio models for use in Audacity. More specifically, it provides abstract classes for you to wrap your waveform-to-waveform and waveform-to-labels models (see the Deep Learning for Audacity website to learn more about deep learning models for audacity).

Table of Contents


img

Download Audacity with Deep Learning

Our work has not yet been merged to the main build of Audacity, though it will be soon. You can keep track of its progress by viewing our pull request. In the meantime, you can download an alpha version of Audacity + Deep Learning here.

Installing

You can install audacitorch using pip:

pip install -e "git+https://github.com/hugofloresgarcia/audacitorch.git#egg=audacitorch"

Contributing Models to Audacity

Supported Torch versions

audacitorch requires for your model to be able to run in Torch 1.9.0, as that's what the Audacity torchscript interpreter uses.

Deep Learning Effect and Analyzer

Audacity is equipped with a wrapper framework for deep learning models written in PyTorch. Audacity contains two deep learning tools: Deep Learning Effect and Deep Learning Analyzer.
Deep Learning Effect performs waveform to waveform processing, and is useful for audio-in-audio-out tasks (such as source separation, voice conversion, style transfer, amplifier emulation, etc.), while Deep Learning Analyzer performs waveform to labels processing, and is useful for annotation tasks (such as sound event detection, musical instrument recognition, automatic speech recognition, etc.). audacitorch contains two abstract classes for serializing two types of models: waveform-to-waveform and waveform-to-labels. The classes are WaveformToWaveformBase, and WaveformToLabelsBase, respectively.

Choosing an Effect Type

Waveform to Waveform models

As shown in the effect diagram, Waveform-to-waveform models receive a single multichannel audio track as input, and may write to a variable number of new audio tracks as output.

Example models for waveform-to-waveform effects include source separation, neural upsampling, guitar amplifier emulation, generative models, etc. Output tensors for waveform-to-waveform models must be multichannel waveform tensors with shape (num_output_channels, num_samples). For every audio waveform in the output tensor, a new audio track is created in the Audacity project.

Waveform to Labels models

As shown in the effect diagram, Waveform-to-labels models receive a single multichannel audio track as input, and may write to an output label track as output. The waveform-to-labels effect can be used for many audio analysis applications, such as voice activity detection, sound event detection, musical instrument recognition, automatic speech recognition, etc. The output for waveform-to-labels models must be a tuple of two tensors. The first tensor corresponds to the class indexes for each label present in the waveform, shape (num_timesteps,). The second tensor must contain timestamps with start and stop times for each label, shape (num_timesteps, 2).

What If My Model Uses a Spectrogram as Input/Output?

If your model uses a spectrogram as input/output, you'll need to wrap your forward pass with some torchscript-compatible preprocessing/postprocessing. We recommend using torchaudio, writing your own preprocessing transforms in their own nn.Module, or writing your PyTorch-only preprocessing and placing it in WaveformToWaveform.do_forward_pass or WaveformToLabels.do_forward_pass. See the compatibility section for more info.

Model Metadata

Certain details about the model, such as its sample rate, tool type (e.g. waveform-to-waveform or waveform-to-labels), list of labels, etc. must be provided by the model contributor in a separate metadata.json file. In order to help users choose the correct model for their required task, model contributors are asked to provide a short and long description of the model, the target domain of the model (e.g. speech, music, environmental, etc.), as well as a list of tags or keywords as part of the metadata. See here for an example metadata dictionary.

Metadata Spec

required fields:

  • sample_rate (int)
    • range (0, 396000)
    • Model sample rate. Input tracks will be resampled to this value.
  • domains (List[str])
    • List of data domains for the model. The list should contain any of the following strings (any others will be ignored): ["music", "speech", "environmental", "other"]
  • short_description(str)
    • max 60 chars
    • short description of the model. should contain a brief message with the model's purpose, e.g. "Use me for separating vocals from the background!".
  • long_description (str)
    • max 280 chars
    • long description of the model. Shown in the detailed view of the model UI.
  • tags (List[str])
    • list of tags (to be shown in the detailed view)
    • each tag should be 15 characters max
    • max 5 tags per model.
  • labels (List[str)
    • output labels for the model. Depending on the effect type, this field means different things
    • waveform-to-waveform
      • name of each output source (e.g. drums, bass, vocal). To create the track name for each output source, each one of the labels will be appended to the mixture track's name.
    • waveform-to-labels:
      • This should be classlist for model. The class indexes output by the model during a forward pass will be used to index into this classlist.
  • effect_type (str)
    • Target effect for this model. Must be one of ["waveform-to-waveform", "waveform-to-labels"].
  • multichannel (bool)
    • If multichannel is set to true, stereo tracks are passed to the model as multichannel audio tensors, with shape (2, n). Note that this means that the input could either be a mono track with shape (1, n) or stereo track with shape (2, n).
    • If multichannel is set to false, stereo tracks are downmixed, meaning that the input audio tensor will always be shape (1, n).

Making Your Model Built-In To Audacity

By default, users have to click on the Add From HuggingFace button on the Audacity Model Manager and enter the desired repo's ID to install a community contributed model. If you, instead, would like your community contributed model to show up in Audacity's Model Manager by default, please open a request here.

Example - Waveform-to-Waveform model

Here's a minimal example for a model that simply boosts volume by multiplying the incoming audio by a factor of 2.

We can sum up the whole process into 4 steps:

  1. Developing your model
  2. Wrapping your model using audacitorch
  3. Creating a metadata document
  4. Exporting to HuggingFace

Developing your model

First, we create our model. There are no internal constraints on what the internal model architecture should be, as long as you can use torch.jit.script or torch.jit.trace to serialize it, and it is able to meet the input-output constraints specified in waveform-to-waveform and waveform-to-labels models.

import torch
import torch.nn as nn

class MyVolumeModel(nn.Module):

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # do the neural net magic!
        x = x * 2

        return x

Making sure your model is compatible with torchscript

PyTorch makes it really easy to deploy your Python models in C++ by using torchscript, an intermediate representation format for torch models that can be called in C++. Many of Python's built-in functions are supported by torchscript. However, not all Python operations are supported by the torchscript environment, meaning that you are only allowed to use a subset of Python operations in your model code. See the torch.jit docs to learn more about writing torchscript-compatible code.

If your model computes spectrograms (or requires any kind of preprocessing/postprocessing), make sure those operations are compatible with torchscript, like torchaudio's operation set.

Useful links:

Wrapping your model using audacitorch

Now, we create a wrapper class for our model. Because our model returns an audio waveform as output, we'll use WaveformToWaveformBase as our parent class. For both WaveformToWaveformBase and WaveformToLabelsBase, we need to implement the do_forward_pass method with our processing code. See the docstrings for more details.

from audacitorch import WaveformToWaveformBase

class MyVolumeModelWrapper(WaveformToWaveformBase):
    
    def do_forward_pass(self, x: torch.Tensor) -> torch.Tensor:
        
        # do any preprocessing here! 
        # expect x to be a waveform tensor with shape (n_channels, n_samples)

        output = self.model(x)

        # do any postprocessing here!
        # the return value should be a multichannel waveform tensor with shape (n_channels, n_samples)
    
        return output

Creating a metadata document

Audacity models need a metadata file. See the metadata spec to learn about the required fields.

metadata = {
    'sample_rate': 48000, 
    'domain_tags': ['music', 'speech', 'environmental'],
    'short_description': 'Use me to boost volume by 3dB :).',
    'long_description':  'This description can be a max of 280 characters aaaaaaaaaaaaaaaaaaaa.',
    'tags': ['volume boost'],
    'labels': ['boosted'],
    'effect_type': 'waveform-to-waveform',
    'multichannel': False,
}

All set! We can now proceed to serialize the model to torchscript and save the model, along with its metadata.

from pathlib import Path
from audacitorch.utils import save_model, validate_metadata, \
                              get_example_inputs, test_run

# create a root dir for our model
root = Path('booster-net')
root.mkdir(exist_ok=True, parents=True)

# get our model
model = MyVolumeModel()

# wrap it
wrapper = MyVolumeModelWrapper(model)

# serialize it using torch.jit.script, torch.jit.trace,
# or a combination of both. 

# option 1: torch.jit.script 
# using torch.jit.script is preferred for most cases, 
# but may require changing a lot of source code
serialized_model = torch.jit.script(wrapper)

# option 2: torch.jit.trace
# using torch.jit.trace is typically easier, but you
# need to be extra careful that your serialized model behaves 
# properly after tracing
example_inputs = get_example_inputs()
serialized_model = torch.jit.trace(wrapper, example_inputs[0], 
                                    check_inputs=example_inputs)

# take your model for a test run!
test_run(serialized_model)

# check that we created our metadata correctly
success, msg = validate_metadata(metadata)
assert success

# save!
save_model(serialized_model, metadata, root)

Exporting to HuggingFace

You should now have a directory structure that looks like this:

/booster-net/
/booster-net/model.pt
/booster-net/metadata.json

This will be the repository for your audacity model. Make sure to add a readme with the audacity tag in the YAML metadata, so it show up on the explore tab of Audacity's Deep Learning Tools.

Create a README.md inside booster-net/, and add the following header:

in README.md

---
tags: audacity
---

Awesome! It's time to push to HuggingFace. See their documentation for adding a model to the HuggingFace model hub.

Debugging Your Model in Audacity

After serializing, you may need to debug your model inside Audacity, to make sure that it handles inputs correctly, doesn't crash while processing, and produces the correct output. While debugging, make sure your model isn't available through other users through the Explore HuggingFace button by temporarily removing the audacity tag from your README file. If your model fails internally while processing audio, you may see something like this:

To debug, you can access the error logs through the Help menu, in Help->Diagnostics->Show Log.... Any torchscript errors that may occur during the forward pass will be redirected here.

Example - Exporting a Pretrained Asteroid model

See this example notebook, where we serialize a pretrained ConvTasNet model for speech separation using the Asteroid source separation library.

Example - Exporting a Pretrained S2T model

See this example notebook, where we serialize a pretrained speech to text transformer from Facebook.


Owner
Hugo Flores García
PhD @interactiveaudiolab
Hugo Flores García
Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation By Qiang Zhou*, Zilong Huang*, Lichao Huang, Han Shen, Yon

Forest 117 Apr 01, 2022
ESP32 python application to read data from a Tilt™ Hydrometer for homebrewing

TitlESP32 ESP32 MicroPython application to read and log data from a Tilt™ Hydrometer. Requirements A board with an ESP32 chip USB cable - USB A / micr

IoBeer 5 Dec 01, 2022
Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)

Causality In Traffic Accident (Under Construction) Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020) Overview Data Prepa

Tackgeun 21 Nov 20, 2022
This is the reference implementation for "Coresets via Bilevel Optimization for Continual Learning and Streaming"

Coresets via Bilevel Optimization This is the reference implementation for "Coresets via Bilevel Optimization for Continual Learning and Streaming" ht

Zalán Borsos 51 Dec 30, 2022
Look Who’s Talking: Active Speaker Detection in the Wild

Look Who's Talking: Active Speaker Detection in the Wild Dependencies pip install -r requirements.txt In addition to the Python dependencies, ffmpeg

Clova AI Research 60 Dec 08, 2022
Plaything for Autistic Children (demo for PaddlePaddle/Wechaty/Mixlab project)

星星的孩子 - 一款为孤独症孩子设计的聊天机器人游戏 孤独症儿童是目前常常被忽视的一类群体。他们有着类似性格内向的特征,实际却受着广泛性发育障碍的折磨。 项目背景 这类儿童在与人交往时存在着沟通障碍,其特点表现在: 社交交流差,互动障碍明显 认知能力有限,被动认知 兴趣狭窄,重复刻板,缺乏变化和想象

Tianyi Pan 35 Nov 24, 2022
Solve a Rubiks Cube using Python Opencv and Kociemba module

Rubiks_Cube_Solver Solve a Rubiks Cube using Python Opencv and Kociemba module Main Steps Get the countours of the cube check whether there are tota

Adarsh Badagala 176 Jan 01, 2023
Decorators for maximizing memory utilization with PyTorch & CUDA

torch-max-mem This package provides decorators for memory utilization maximization with PyTorch and CUDA by starting with a maximum parameter size and

Max Berrendorf 10 May 02, 2022
一个运行在 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 或 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 等定时面板的签到项目

定时面板上的签到盒 一个运行在 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 或 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 等定时面板的签到项目 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 特别声明 本仓库发布的脚本及其中涉及的任何解锁和解密分析脚本,仅用于测试和学习研究,禁止用于商业用途,不能保证其合

Leon 1.1k Dec 30, 2022
Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 42 Nov 26, 2022
Large scale and asynchronous Hyperparameter Optimization at your fingertip.

Syne Tune This package provides state-of-the-art distributed hyperparameter optimizers (HPO) where trials can be evaluated with several backend option

Amazon Web Services - Labs 236 Jan 01, 2023
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

58 Jan 06, 2023
PyTorch implementation of HDN(Homography Decomposition Networks) for planar object tracking

Homography Decomposition Networks for Planar Object Tracking This project is the offical PyTorch implementation of HDN(Homography Decomposition Networ

CaptainHook 48 Dec 15, 2022
PyTorch-based framework for Deep Hedging

PFHedge: Deep Hedging in PyTorch PFHedge is a PyTorch-based framework for Deep Hedging. PFHedge Documentation Neural Network Architecture for Efficien

139 Dec 30, 2022
ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

Status: Under development (expect bug fixes and huge updates) ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectiv

37 Dec 28, 2022
Details about the wide minima density hypothesis and metrics to compute width of a minima

wide-minima-density-hypothesis Details about the wide minima density hypothesis and metrics to compute width of a minima This repo presents the wide m

Nikhil Iyer 9 Dec 27, 2022
Official implementation of the method ContIG, for self-supervised learning from medical imaging with genomics

ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics This is the code implementation of the paper "ContIG: Self-s

Digital Health & Machine Learning 22 Dec 13, 2022
The official PyTorch code for 'DER: Dynamically Expandable Representation for Class Incremental Learning' accepted by CVPR2021

DER.ClassIL.Pytorch This repo is the official implementation of DER: Dynamically Expandable Representation for Class Incremental Learning (CVPR 2021)

rhyssiyan 108 Jan 01, 2023
FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI

FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI 声明: 本项目仅限于学习交流,不可用于非法用途,包括但不限于:用于游戏外挂等,使用本项目产生的任何后果与本人无关! 简介 本项目基于yolov5,实现了一款FPS类游戏(CF、CSGO等)的自瞄AI,本项目旨在使用现

Fabian 246 Dec 28, 2022
Feup-csr - Repository holding my group's submission to the CSR project competition

CSR Competições de Swarm Robotics Swarm Robotics Competitions This repository holds the files submitted for the CSR project competition. Project group

Nuno Pereira 1 Jan 04, 2022