Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

Last update: Nov 07, 2022

Overview

Audio-Track Separator

Introduction

Audio Source Separation is the process of separating a mixture (e.g. a pop band recording) into isolated sounds from individual sources (e.g. just the lead vocals). Basically, splitting a song into separate vocals and instruments.

In this Repository, We developed an audio track separator in tensorflow that successfully separates Vocals and Drums from an input audio song track.

We trained a U-Net model with two output layers. One output layer predicts the Vocals and the other predicts the Drums. The number of Output layers could be increased based on the number of elements one needs to separate from input Audio Track.

Technologies used:

The entire architecture is built with tensorflow.
Matplotlib has been used for visualization.
Numpy has been used for mathematical operations.
Librosa have used for the processing of Audio files.
nussl for Dataset.

The dataset

We will be using the MUSDB18 dataset for this tutorial.

The musdb18 is a dataset of 150 full lengths music tracks (~10h duration) of different genres along with their isolated drums, bass, vocals and others stems.

musdb18 contains two folders, a folder with a training set: "train", composed of 100 songs, and a folder with a test set: "test", composed of 50 songs. Supervised approaches should be trained on the training set and tested on both sets.

All signals are stereophonic and encoded at 44.1kHz.

Exploratory Data Analysis

Building a Data Loader

In the pipeline we are re-sampling the audio data. For the time being our target is to separate the the Vocal and Drums audio from the original, hence the Pipeline returns original processed Audio as X and an array of processed Vocals & Drums audio as y.

Unet Architecture

model = AudioTrackSeparation()
model.build(input_shape=(None, DIM, 1))
model.build_graph().summary()

Implementation

Training

!python main.py --sampling_rate 11025 --train True --epoch 50 --batch 16 --model_save_path ./models/

Trains the u-net model on MUSDB18 Dataset and saves the trained model to the provided directory ( --model_save_path ).

Testing

!python main.py --sampling_rate 11025 --test /content/pop.00000.wav --model_save_path ./models/

Loads the model from model_save_path, reads the audio file from the provided path( --test ) with librosa, process it and use the model to predict the output. In the end, the predictions are visualized by a wave plot and saved to the root directory.

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

Related tags

Overview

Audio-Track Separator

Introduction

Technologies used:

The dataset

Exploratory Data Analysis

Building a Data Loader

Unet Architecture

Implementation

Training

Testing

Model Performance

Predictions

References

Owner

Victor Basu

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

Image Data Augmentation in Keras

Datasets for new state-of-the-art challenge in disentanglement learning

Project of 'TBEFN: A Two-branch Exposure-fusion Network for Low-light Image Enhancement '

Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'

Code for ICDM2020 full paper: "Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning"

PyTorch implementation of Convolutional Neural Fabrics http://arxiv.org/abs/1606.02492

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Code for the upcoming CVPR 2021 paper

Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling"

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

ProjectOxford-ClientSDK - This repo has moved :house: Visit our website for the latest SDKs & Samples

Face Library is an open source package for accurate and real-time face detection and recognition

Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21)

Code repository for "Reducing Underflow in Mixed Precision Training by Gradient Scaling" presented at IJCAI '20

Learned Initializations for Optimizing Coordinate-Based Neural Representations

Data augmentation for NLP, accepted at EMNLP 2021 Findings

Code corresponding to The Introspective Agent: Interdependence of Strategy, Physiology, and Sensing for Embodied Agents