MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Last update: Jan 03, 2023

Related tags

Audio midi-ddsp

Overview

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Demos | Blog Post | Colab Notebook | Paper |

MIDI-DDSP is a hierarchical audio generation model for synthesizing MIDI expanded from DDSP.

Install MIDI-DDSP

You could install MIDI-DDSP via pip, which allows you to use the cool Command-line MIDI synthesis to synthesize your MIDI.

To install MIDI-DDSP via pip, simply run:

pip install midi-ddsp

Train MIDI-DDSP

To train MIDI-DDSP, please first install midi-ddsp and clone the MIDI-DDSP repository:

git clone https://github.com/magenta/midi-ddsp.git

For dataset, please download the tfrecord files for the URMP dataset in here to the data folder in your cloned repository using the following commands:

cd midi-ddsp # enter the project directory
mkdir ./data # create a data folder
gsutil cp gs://magentadata/datasets/urmp/urmp_20210324/* ./data # download tfrecords to directory

Please check here for how to install and use gsutil.

Finally, you can run the script train_midi_ddsp.sh to train the exact same model we used in the paper:

sh ./train_midi_ddsp.sh

The current codebase does not support training with arbitrary dataset, but we will hopefully update that in the near future.

Side note:

If one download the dataset to a different location, please change the data_dir parameter in train_midi_ddsp.sh.

The training of MIDI-DDSP takes approximately 18 hours on a single RTX 8000. The training code for now does not support multi-GPU training. We recommend using a GPU with more than 24G of memory when training Synthesis Generator in batch size of 16. For a GPU with less memory, please consider using a smaller batch size and change the batch size in train_midi_ddsp.sh.

Try to play with MIDI-DDSP yourself!

Please try out MIDI-DDSP in Colab notebooks!

In this notebook, you will try to use MIDI-DDSP to synthesis a monophonic MIDI file, adjust note expressions, make pitch bend by adjusting synthesis parameters, and synthesize quartet from Bach chorales.

We have trained MIDI-DDSP on the URMP dataset which support synthesizing 13 instruments: violin, viola, cello, double bass, flute, oboe, clarinet, saxophone, bassoon, trumpet, horn, trombone, tuba. You could find how to download and use our pre-trained model below:

Command-line MIDI synthesis

On can use the MIDI-DDSP as a command-line MIDI synthesizer just like FluidSynth.

To use command-line synthesis to synthesize a midi file, please first download the model weights by running:

midi_ddsp_download_model_weights

To synthesize a midi file simply run the following command:

midi_ddsp_synthesize --midi_path <path-to-midi>

For a starter, you can try to synthesize the example midi file in this repository:

midi_ddsp_synthesize --midi_path ./midi_example/ode_to_joy.mid

The command line also enables synthesize a folder of midi files. For more advance use (synthesize a folder, using FluidSynth for instruments not supported, etc.), please see synthesize_midi.py --help.

If you have a trouble downloading the model weights, please manually download from here, and specify the synthesis_generator_weight_path and expression_generator_weight_path by yourself when using the command line. You can also specify your other model weights if you want to use your own trained model.

Python Usage

After installing midi-ddsp, you could import midi-ddsp in python and synthesize MIDI in your code.

Minimal Example

Here is a simple example to use MIDI-DDSP to synthesize a midi file:

from midi_ddsp import synthesize_midi, load_pretrained_model

midi_file = 'ode_to_joy.mid'
# Load pre-trained model
synthesis_generator, expression_generator = load_pretrained_model()
# Synthesize MIDI
output = synthesize_midi(synthesis_generator, expression_generator, midi_file)
# The synthesized audio
synthesized_audio = output['mix_audio']

Advance Usage

Here is an advance example to synthesize the ode_to_joy.mid, change the note expression controls, and adjust the synthesis parameters:

import numpy as np
import tensorflow as tf
from midi_ddsp.utils.midi_synthesis_utils import synthesize_mono_midi, conditioning_df_to_audio
from midi_ddsp.utils.inference_utils import get_process_group
from midi_ddsp.midi_ddsp_synthesize import load_pretrained_model
from midi_ddsp.data_handling.instrument_name_utils import INST_NAME_TO_ID_DICT

# -----MIDI Synthesis-----
midi_file = 'ode_to_joy.mid'
# Load pre-trained model
synthesis_generator, expression_generator = load_pretrained_model()
# Synthesize with violin:
instrument_name = 'violin'
instrument_id = INST_NAME_TO_ID_DICT[instrument_name]
# Run model prediction
midi_audio, midi_control_params, midi_synth_params, conditioning_df = synthesize_mono_midi(synthesis_generator,
                                                                                           expression_generator,
                                                                                           midi_file, instrument_id,
                                                                                           output_dir=None)

synthesized_audio = midi_audio  # The synthesized audio

# -----Adjust note expression controls and re-synthesize-----

# Make all notes weak vibrato:
conditioning_df_changed = conditioning_df.copy()
note_vibrato = conditioning_df_changed['vibrato_extend'].value
conditioning_df_changed['vibrato_extend'] = np.ones_like(conditioning_df['vibrato_extend'].values) * 0.1
# Re-synthesize
midi_audio_changed, midi_control_params_changed, midi_synth_params_changed = conditioning_df_to_audio(
  synthesis_generator, conditioning_df_changed, tf.constant([instrument_id]))

synthesized_audio_changed = midi_audio_changed  # The synthesized audio

# There are 6 note expression controls in conditioning_df that you could change:
# 'amplitude_mean', 'amplitude_std', 'vibrato_extend', 'brightness', 'attack_level', 'amplitudes_max_pos'.
# Please refer to https://colab.research.google.com/github/magenta/midi-ddsp/blob/main/midi_ddsp/colab/MIDI_DDSP_Demo.ipynb#scrollTo=XfPPrdPu5sSy for the effect of each control. 

# -----Adjust synthesis parameters and re-synthesize-----

# The original synthesis parameters:
f0_ori = midi_synth_params['f0_hz']
amps_ori = midi_synth_params['amplitudes']
noise_ori = midi_synth_params['noise_magnitudes']
hd_ori = midi_synth_params['harmonic_distribution']

# TODO: make your change of the synthesis parameters here:
f0_changed = f0_ori
amps_changed = amps_ori
noise_changed = noise_ori
hd_changed = hd_ori

# Resynthesis the audio using DDSP
processor_group = get_process_group(midi_synth_params['amplitudes'].shape[1], use_angular_cumsum=True)
midi_audio_changed = processor_group({'amplitudes': amps_changed,
                                      'harmonic_distribution': hd_changed,
                                      'noise_magnitudes': noise_changed,
                                      'f0_hz': f0_changed, },
                                     verbose=False)
midi_audio_changed = synthesis_generator.reverb_module(midi_audio_changed, reverb_number=instrument_id, training=False)

synthesized_audio_changed = midi_audio_changed  # The synthesized audio

Comments

ImportError and AttributeError
Hi! Very interesting work! I’m trying to run MIDI_DDSP_Demo.ipynb, and I encountered some errors.

ImportError: cannot import name 'LD_RANGE' occurred in from ddsp.spectral_ops import F0_RANGE, LD_RANGE(see here). -> According to DDSP, I think 'DB_RANGE' is correct, not 'LD_RANGE' .

AttributeError: module 'ddsp.spectral_ops' has no attribute 'amplitude_to_db' occurred where ddsp.spectral_ops.amplitude_to_db is used (see here). -> According to DDSP, I suppose it is 'ddsp.core', not 'ddsp.spectral_ops'.
opened by MasayaKawamura 3
Question on the installation

Hi,

When I opened up a new colab notebook and tried to pip install the midi-ddsp package, it took over 40 minutes and the installation can not be completed. I didn't experience this until this week.

I've tried pip install midi-ddsp and pip install git+https://github.com/magenta/midi-ddsp and both gave me the same results.

It seemed like pip would spend a lot of time trying to find which version of etils was compatible.

opened by tiianhk 2
Error when using "pip install midi-ddsp"
Hello everyone,

I'm trying to use midi-ddsp to synthesize a few .midi files. In order to achieve it, I create a virtual environment with:

python3 -m venv .venv source .venv/bin/activate

note: python version == 3.10.6

After creating it I run: "pip install midi-ddsp" . Getting this error message:

Collecting ddsp Using cached ddsp-1.9.0-py2.py3-none-any.whl (200 kB) Using cached ddsp-1.7.1-py2.py3-none-any.whl (199 kB) Using cached ddsp-1.7.0-py2.py3-none-any.whl (197 kB) Using cached ddsp-1.6.5-py2.py3-none-any.whl (194 kB) Using cached ddsp-1.6.3-py2.py3-none-any.whl (194 kB) Using cached ddsp-1.6.2-py2.py3-none-any.whl (194 kB) Using cached ddsp-1.6.0-py2.py3-none-any.whl (194 kB) Using cached ddsp-1.4.0-py2.py3-none-any.whl (192 kB) Using cached ddsp-1.3.1-py2.py3-none-any.whl (192 kB) Using cached ddsp-1.3.0-py2.py3-none-any.whl (183 kB) Using cached ddsp-1.2.0-py2.py3-none-any.whl (179 kB) Using cached ddsp-1.1.0-py2.py3-none-any.whl (175 kB) Using cached ddsp-1.0.1-py2.py3-none-any.whl (170 kB) Using cached ddsp-1.0.0-py2.py3-none-any.whl (168 kB) Using cached ddsp-0.14.0-py2.py3-none-any.whl (143 kB) Using cached ddsp-0.13.1-py2.py3-none-any.whl (129 kB) Using cached ddsp-0.13.0-py2.py3-none-any.whl (129 kB) Using cached ddsp-0.12.0-py2.py3-none-any.whl (127 kB) Using cached ddsp-0.10.0-py2.py3-none-any.whl (109 kB) Using cached ddsp-0.9.0-py2.py3-none-any.whl (109 kB) Using cached ddsp-0.8.0-py2.py3-none-any.whl (108 kB) Using cached ddsp-0.7.0-py2.py3-none-any.whl (107 kB) Using cached ddsp-0.5.1-py2.py3-none-any.whl (101 kB) Using cached ddsp-0.5.0-py2.py3-none-any.whl (101 kB) Using cached ddsp-0.4.0-py2.py3-none-any.whl (97 kB) Using cached ddsp-0.2.4-py2.py3-none-any.whl (89 kB) Using cached ddsp-0.2.3-py2.py3-none-any.whl (89 kB) Using cached ddsp-0.2.2-py2.py3-none-any.whl (89 kB) Using cached ddsp-0.2.0-py2.py3-none-any.whl (88 kB) Using cached ddsp-0.1.0-py3-none-any.whl (88 kB) Using cached ddsp-0.0.10-py3-none-any.whl (88 kB) Using cached ddsp-0.0.9-py3-none-any.whl (86 kB) Using cached ddsp-0.0.8-py3-none-any.whl (86 kB) Using cached ddsp-0.0.7-py3-none-any.whl (85 kB) Using cached ddsp-0.0.6-py2.py3-none-any.whl (91 kB) Using cached ddsp-0.0.5-py2.py3-none-any.whl (91 kB) Using cached ddsp-0.0.4-py2.py3-none-any.whl (83 kB) Using cached ddsp-0.0.3-py2.py3-none-any.whl (81 kB) Using cached ddsp-0.0.1-py2.py3-none-any.whl (75 kB) Using cached ddsp-0.0.0-py2.py3-none-any.whl (75 kB) INFO: pip is looking at multiple versions of midi-ddsp to determine which version is compatible with other requirements. This could take a while. Collecting midi-ddsp Using cached midi_ddsp-0.1.3-py3-none-any.whl (56 kB) Using cached midi_ddsp-0.1.1-py3-none-any.whl (56 kB) Using cached midi_ddsp-0.1.0-py3-none-any.whl (53 kB) ERROR: Cannot install midi-ddsp because these package versions have conflicting dependencies.

The conflict is caused by: ddsp 3.4.4 depends on tensorflow ddsp 3.4.3 depends on tensorflow ddsp 3.4.1 depends on tensorflow ddsp 3.4.0 depends on tensorflow ddsp 3.3.6 depends on tensorflow ddsp 3.3.4 depends on tensorflow ddsp 3.3.2 depends on tensorflow ddsp 3.3.0 depends on tensorflow ddsp 3.2.1 depends on tensorflow ddsp 3.2.0 depends on tensorflow ddsp 3.1.0 depends on tensorflow ddsp 1.9.0 depends on tensorflow ddsp 1.7.1 depends on tensorflow ddsp 1.7.0 depends on tensorflow ddsp 1.6.5 depends on tensorflow ddsp 1.6.3 depends on tensorflow ddsp 1.6.2 depends on tensorflow ddsp 1.6.0 depends on tensorflow ddsp 1.4.0 depends on tensorflow ddsp 1.3.1 depends on tensorflow ddsp 1.3.0 depends on tensorflow ddsp 1.2.0 depends on tensorflow ddsp 1.1.0 depends on tensorflow ddsp 1.0.1 depends on tensorflow ddsp 1.0.0 depends on tensorflow ddsp 0.14.0 depends on tensorflow ddsp 0.13.1 depends on tensorflow ddsp 0.13.0 depends on tensorflow ddsp 0.12.0 depends on tensorflow ddsp 0.10.0 depends on tensorflow ddsp 0.9.0 depends on tensorflow ddsp 0.8.0 depends on tensorflow ddsp 0.7.0 depends on tensorflow ddsp 0.5.1 depends on tensorflow ddsp 0.5.0 depends on tensorflow ddsp 0.4.0 depends on tensorflow ddsp 0.2.4 depends on tensorflow ddsp 0.2.3 depends on tensorflow ddsp 0.2.2 depends on tensorflow ddsp 0.2.0 depends on tensorflow ddsp 0.1.0 depends on tensorflow ddsp 0.0.10 depends on tensorflow ddsp 0.0.9 depends on tensorflow ddsp 0.0.8 depends on tensorflow ddsp 0.0.7 depends on tensorflow ddsp 0.0.6 depends on tensorflow ddsp 0.0.5 depends on tensorflow ddsp 0.0.4 depends on tensorflow ddsp 0.0.3 depends on tensorflow ddsp 0.0.1 depends on tensorflow ddsp 0.0.0 depends on tensorflow>=2.1.0

To fix this you could try to:

loosen the range of package versions you've specified

remove package versions to allow pip attempt to solve the dependency conflict

ERROR: Resolution Impossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

======== I am currently working from an M1 mac so I would be pleased if you could help me reaching a solution for this problem.

Thank you in advance, Juan Carlos
opened by JuanCarlosMartinezSevilla 2
Why is vibrato_rate not used?

In my opinion, vibrato_rate (peak frequency) is more plausible than vibrato_extend (peak amplitude) to represent pitch pulsating. Why is vibrato_rate not used?

opened by bfs18 2
Docker image with all the necessary packages

Hi again!

I'm still trying to execute your code with the "pip install midi-ddsp".

I'm using docker from a tensorflow/tensorflow:latest image and running the pip command. I get this error:

"E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered"

I ask you guys in order to know if you work with docker... if I could be able to have access to the image you use so all versions are the same and to avoid these type of errors.

Thank you, Best regards, Juan Carlos

opened by JuanCarlosMartinezSevilla 0
How to distinguish whether it is cresc or decresc in fluctuation?

Hi! I am interested in your project. Currently, I am doing similar stuff, but i have some questions. (1) I know that Fluctuation stands for how much does that note cresc or decresc. Peak means which position got the maximum energy. However, i am wondering how to know if that note is cresc or decresc. Let's assume our peak position is 0.4, and fluc is 0.3. Then, is it decrease 0.3 or increase 0.3? How to distinguish it? (2) I know that all the expressive control values are normalized between 0 and 1, but what's the unit measure in these expressive controls?

Thanks for answering

opened by pillow8781 2
What is ``input`` in the def call()?
Hi, I am looking inside the code. I've seen a lot of methods about def call(self, inputs) in your code, especially looking at this one.

def call(self, inputs): synth_params = self.get_synth_params(inputs)

However, I couldn't find out what's the calculation of inputs, there are some clues I've found. In those codes, inputs is respond to the data in get_fake_data_synthesis_generator, then what are the data and units you input to get_fake_data_synthesis_generator? Frames? Amplitude or anything else?

Thanks!
opened by Megan8821 4
How to solve the error when installing midi-ddsp?

Hi, I am new in training model, and I got some problems in here. Does anyone how to solve the error of error: subprocess-exited-with -error and error: metadata-generation-failed in the terminal?

opened by Megan8821 4
tfrecord features
Hello, I'm very interested in your amazing work. Took a deep look at the urmp tfrecord datasets, I found something a bit confusing for me. Could you be so kind to help?

I checked some urmp_tfrecords data, I found that some of them contain the following features: {"audio", "f0_confidence", "f0_hz", "f0_time", "id", "instrument_id", "loudness_db", "note_active_frame_indices", "note_active_velocities", "note_offsets", "note_onsets", "orig_f0_hz", "orig_f0_time", "power_db", "recording_id", "sequence"}. However, some of them don't include {"orig_f0_hz", "orig_f0_time"} in their tfrecord data. Why is this so and does such an inconsistency influence the model training?

I want to include piano music when I train my own model. To this end, I think I need to generate tfrecords that have the same content as the urmp ones you used in your model. I plan to use maestro dataset. Could you be so kind to indicate if there's a tfrecord data generation code that we can take as a reference? Like the one you used to generate tfrecords for the midi-ddsp model?

What is the difference between "batched" and "unbatched" dataset?

Thank you very much for your help in advance.
opened by gladys0313 1
Can i test with custom data?
Hi, i am interested in this exciting project and i am trying to test this with our custom dataset and reproduce the format of original data. But there are some difficulties and questions below.

Is there no way to use custom datasets at all?

I saw a previous issue about arbitrary dataset.

Is there any code to calculate elements of dataset below?

I want to know how to get "note_active_velocities", "note_active_frame_indices", "power_db", "note_onsets", "note_offsets" but there is no any code on repository.

Thank you for reading!
opened by 589hero 6
Question on Figure

quick question on this figure in the blog post: i know coconet is its own model that will generate subsequent melodies given the input midi file. however, should i decide to train midi ddsp, will the training of coconet also be a part of this? or should i expect a monophonic midi melody as input and the generated audio as output.

thanks for all the help and this awesome project

opened by theadamsabra 7

Releases(v0.2.5)

v0.2.5(Jun 12, 2022)

update note expression name in the code, minor bug fix
Source code(tar.gz)
Source code(zip)
v0.2.4(Apr 13, 2022)

Update to clip note expression output between 0 and 1.
Source code(tar.gz)
Source code(zip)
v0.2.3(Apr 4, 2022)

Minor bug fix in command-line MIDI synthesis
Source code(tar.gz)
Source code(zip)
v0.2.2(Mar 30, 2022)

Minor bug fix Add save metadata option to command-line synthesis
Source code(tar.gz)
Source code(zip)
v0.2.1(Mar 6, 2022)

Update bugs in command-line synthesis.
Source code(tar.gz)
Source code(zip)
v0.2.0(Mar 4, 2022)

Upgrade command-line to support saving synthesis parameters.
Source code(tar.gz)
Source code(zip)
v0.1.5(Feb 21, 2022)

Changed imports to update ddsp to 3.2.0 version
Source code(tar.gz)
Source code(zip)
v0.1.4(Jan 8, 2022)
Update readme

Fix bug in command-line MIDI synthesis

Source code(tar.gz)
Source code(zip)
v0.1.3(Dec 29, 2021)
Fix bug in command-line synthesis on skipping existing files.

Source code(tar.gz)
Source code(zip)
v0.1.2(Dec 27, 2021)
First GitHub release

Support command-line MIDI synthesis

Support Colab MIDI Synthesis

Support MIDI synthesis using API

Source code(tar.gz)
Source code(zip)

Owner

Magenta

An open source research project exploring the role of machine learning as a tool in the creative process.

GitHub Repository

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Related tags

Overview

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Links

Install MIDI-DDSP

Train MIDI-DDSP

Try to play with MIDI-DDSP yourself!

Command-line MIDI synthesis

Python Usage

Minimal Example

Advance Usage

Comments

Releases(v0.2.5)

v0.2.5(Jun 12, 2022)

v0.2.4(Apr 13, 2022)

v0.2.3(Apr 4, 2022)

v0.2.2(Mar 30, 2022)

v0.2.1(Mar 6, 2022)

v0.2.0(Mar 4, 2022)

v0.1.5(Feb 21, 2022)

v0.1.4(Jan 8, 2022)

v0.1.3(Dec 29, 2021)

v0.1.2(Dec 27, 2021)

Owner

Magenta

L-SpEx: Localized Target Speaker Extraction

Vixtify - Python Controlled Music Player

Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums)

GNOME powered sound conversion

Terminal-based audio-to-text converter

Voice helper on russian

This library provides common speech features for ASR including MFCCs and filterbank energies.

AudioDVP:Photorealistic Audio-driven Video Portraits

Cobra is a highly-accurate and lightweight voice activity detection (VAD) engine.

This is a short program that takes the input from your microphone and uses OpenGL to draw a live colourful pattern

A library for augmenting annotated audio data

Telegram Voice-Chat Bot Written In Python Using Pyrogram.

Audio features extraction

FPGA based USB 2.0 high speed audio interface featuring multiple optical ADAT inputs and outputs

commonfate 📦commonfate 📦 - Common Fate Model and Transform.

Manipulate audio with a simple and easy high level interface

kapre: Keras Audio Preprocessors

Python wrapper around sox.

A bot that can play music on Telegram Group and Channel Voice Chats

a library for audio and music analysis