MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.

Overview

MusicYOLO

MusicYOLO framework uses the object detection model, YOLOX, to locate notes in the spectrogram. Its performance on the ISMIR2014 dataset, MIR-ST500 dataset and SSVD dataset show that MusicYOLO significantly improves onset/offset detection compared with previous approaches.

Installation

Step1. Install pytorch.

conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch

Step1. Install YOLOX.

git clone [email protected]:xk-wang/MusicYOLO.git
cd MusicYOLO
pip3 install -U pip && pip3 install -r requirements.txt
pip3 install -v -e .  # or  python3 setup.py develop

Step2. Install apex.

# skip this step if you don't want to train model.
cd apex
pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" .

Step3. Install pycocotools.

pip3 install cython;
cd cocoapi/PythonAPI && pip3 install -v .

Inference

Download the pretrained musicyolo1 and musicyolo2 models described in our paper. Put these two models under the models folder. The models are stored in BaiduYun https://pan.baidu.com/s/1TbE36ydi-6EZXwxo5DwfLg?pwd=1234 code: 1234

SSVD & ISMIR2014

Step1. Download SSVD-v2.0 from https://github.com/xk-wang/SSVD-v2.0

Step2. Onset/offset detection (use musicyolo2.pth)

python3 tools/predict.py -f exps/example/custom/yolox_singing.py -c models/musicyolo2.pth --audiodir $SSVD_TEST_SET_PATH --savedir $SAVE_PATH --ext .flac --device gpu

Step3. Evaluate

python3 tools/note_eval.py --label $SSVD_TEST_SET_PATH --result $SAVE_PATH --offset

Similar process for ISMIR2014 dataset.

MIR-ST500

Since MIR-ST500 dataset is a mixture of vocals and accompaniments, we need to separate vocals and accompaniments with spleeter first. Besides, since the singing duration of each audio in MIR-ST500 dataset is too long, we will first cut each audio into short audios of about 35s for on/offset detection.

Step1. Audio source seperation

python3 tools/util/do_spleeter.py $MIR_ST500_DIR

Step2. Split audio

python3 tools/util/split_mst.py --mst_path $MST_TEST_VOCAL_PATH --dest_dir $SPLIT_PATH

Step3. Onset/offset detection (use musicyolo1.pth)

python3 tools/predict.py -f exps/example/custom/yolox_singing.py -c models/musicyolo1.pth --audiodir $SPLIT_PATH --savedir $SAVE_PATH --ext .wav --device gpu

Step4. Merge results

Because we split the MIR-ST500 test set audio earlier, the results are also splited. Here we merge the split results.

python3 tools/util/merge_res.py --audio_dir $SPLIT_PATH --origin_dir $SAVE_PATH --final_dir $MERGE_PATH

Step5. Evaluate

python3 tools/note_eval.py --label $MIR_ST500_TEST_LABEL_PATH --result $MERGE_PATH --offset

Train yourself

Download yolox-s weight from https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth . Put the model weight under models folder.

Train on SSVD (get musicyolo2)

Step1. Get SSVD train set

Download SSVD-v2.0 from https://github.com/xk-wang/SSVD-v2.0. Put the images folder under the datasets folder.

Step2. Train

python3 tools/train.py -f exps/example/custom/yolox_singing.py -d 1 -b 16 --fp16 -o -c models/yolox_s.pth

Train on MIR-ST500 (get musicyolo1)

Prepair note object detection dataset

Because there are a few audios for SSVD training set, we use Labelme software to annotate note object manually. There are a lot of data in MIR-ST500 training set, so we design a set of automatic annotation tools.

Step1. Audio source seperation

python3 tools/util/do_spleeter.py $MIR_ST500_TRAIN_DIR

Step2. Split audio

python3 tools/util/split_mst.py --mst_path $MIR_ST500_TRAIN_DIR --dest_dir $TRAIN_SPLIT_PATH

Step3. Automatic annotation

python3 tools/util/automatic_annotation.py --audiodir $TRAIN_SPLIT_PATH --imgdir $MST_NOTE_PATH

Step4. Automatic annotation

Divide the training set and validation set by yourself. We break up the images and divide them according to the ratio of 7:3 to get the training set and validation set. The images and annotations are put under $YOU_MIR_ST500_IMAGES folder.

Step4. Coco dataset format

The MIR-st500 note object detection dataset is organized in a format similar to the images folder in SSVD v2.0 dataset.

python3 tools/util/labelme2coco.py --annotationpath $YOU_MIR_ST500_IMAGES/train --jsonpath $IMAGE_DIR/train/_annotations.coco.json

python3 tools/util/labelme2coco.py --annotationpath $YOU_MIR_ST500_IMAGES/valid --jsonpath $IMAGE_DIR/valid/_annotations.coco.json

then put the MIR-ST500 note object detection dataset under the datasets folder like SSVD.

Train

the similar process like training on SSVD dataset.

Citation

 @article{yolox2021,
  title={YOLOX: Exceeding YOLO Series in 2021},
  author={Ge, Zheng and Liu, Songtao and Wang, Feng and Li, Zeming and Sun, Jian},
  journal={arXiv preprint arXiv:2107.08430},
  year={2021}
}

@inproceedings{musicyolo2022,
  title={A SIGHT-SINGING ONSET/OFFSET DETECTION FRAMEWORK BASED ON OBJECT DETECTION INSTEAD OF SPECTRUM FRAMES.},
  author={X. Wang, W. Xu, W. Yang and W. Cheng},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={},
  year={2022},
}
Owner
Xianke Wang
Stay hungry stay foolish!
Xianke Wang
Code for PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

PackNet: https://arxiv.org/abs/1711.05769 Pretrained models are available here: https://uofi.box.com/s/zap2p03tnst9dfisad4u0sfupc0y1fxt Datasets in Py

Arun Mallya 216 Jan 05, 2023
Replication package for the manuscript "Using Personality Detection Tools for Software Engineering Research: How Far Can We Go?" submitted to TOSEM

tosem2021-personality-rep-package Replication package for the manuscript "Using Personality Detection Tools for Software Engineering Research: How Far

Collaborative Development Group 1 Dec 13, 2021
This repository contains the code used for the implementation of the paper "Probabilistic Regression with HuberDistributions"

Public_prob_regression_with_huber_distributions This repository contains the code used for the implementation of the paper "Probabilistic Regression w

David Mohlin 1 Dec 04, 2021
On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization

On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization This repository contains the evaluation code and alternative pseudo ground truth

Torsten Sattler 36 Dec 22, 2022
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

Counterfactual VQA (CF-VQA) This repository is the Pytorch implementation of our paper "Counterfactual VQA: A Cause-Effect Look at Language Bias" in C

Yulei Niu 94 Dec 03, 2022
MERLOT: Multimodal Neural Script Knowledge Models

merlot MERLOT: Multimodal Neural Script Knowledge Models MERLOT is a model for learning what we are calling "neural script knowledge" -- representatio

Rowan Zellers 190 Dec 22, 2022
Backdoor Attack through Frequency Domain

Backdoor Attack through Frequency Domain DEPENDENCIES python==3.8.3 numpy==1.19.4 tensorflow==2.4.0 opencv==4.5.1 idx2numpy==1.2.3 pytorch==1.7.0 Data

5 Jun 18, 2022
Supervised forecasting of sequential data in Python.

Supervised forecasting of sequential data in Python. Intro Supervised forecasting is the machine learning task of making predictions for sequential da

The Alan Turing Institute 54 Nov 15, 2022
The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue.

The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue. How do I cite D-REX? For now, cite

Alon Albalak 6 Mar 31, 2022
Medical-Image-Triage-and-Classification-System-Based-on-COVID-19-CT-and-X-ray-Scan-Dataset

Medical-Image-Triage-and-Classification-System-Based-on-COVID-19-CT-and-X-ray-Sc

2 Dec 26, 2021
1st-in-MICCAI2020-CPM - Combined Radiology and Pathology Classification

Combined Radiology and Pathology Classification MICCAI 2020 Combined Radiology a

22 Dec 08, 2022
Curvlearn, a Tensorflow based non-Euclidean deep learning framework.

English | 简体中文 Why Non-Euclidean Geometry Considering these simple graph structures shown below. Nodes with same color has 2-hop distance whereas 1-ho

Alibaba 123 Dec 12, 2022
Robust Lane Detection via Expanded Self Attention (WACV 2022)

Robust Lane Detection via Expanded Self Attention (WACV 2022) Minhyeok Lee, Junhyeop Lee, Dogyoon Lee, Woojin Kim, Sangwon Hwang, Sangyoun Lee Overvie

Min Hyeok Lee 18 Nov 12, 2022
Transformer Huffman coding - Complete Huffman coding through transformer

Transformer_Huffman_coding Complete Huffman coding through transformer 2022/2/19

3 May 19, 2022
Autotype on websites that have copy-paste disabled like Moodle, HackerEarth contest etc.

Autotype A quick and small python script that helps you autotype on websites that have copy paste disabled like Moodle, HackerEarth contests etc as it

Tushar 32 Nov 03, 2022
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"

TimeSformer This is an official pytorch implementation of Is Space-Time Attention All You Need for Video Understanding?. In this repository, we provid

Facebook Research 1k Dec 31, 2022
This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition This is the research repository for Vid2

Future Interfaces Group (CMU) 26 Dec 24, 2022
🎁 3,000,000+ Unsplash images made available for research and machine learning

The Unsplash Dataset The Unsplash Dataset is made up of over 250,000+ contributing global photographers and data sourced from hundreds of millions of

Unsplash 2k Jan 03, 2023
Based on the given clinical dataset, Predict whether the patient having Heart Disease or Not having Heart Disease

Heart_Disease_Classification Based on the given clinical dataset, Predict whether the patient having Heart Disease or Not having Heart Disease Dataset

Ashish 1 Jan 30, 2022
Ranger deep learning optimizer rewrite to use newest components

Ranger21 - integrating the latest deep learning components into a single optimizer Ranger deep learning optimizer rewrite to use newest components Ran

Less Wright 266 Dec 28, 2022