AIcrowd | Music Demixing Track - MDX'23

Warm-Up Round: Completed

Phase I: Completed

Phase 2: Completed Weight: 1.0

AIcrowd &

Sony Group Corporation &

Moises.AI &

Mitsubishi Electric Research Laboratories

38.1k

1263

2031

🏆 Winner's Solutions

🔍 Discover released models and source code in our MDX track and CDX track papers' "Notes" section.

🗣️ Explore teams' model announcements on the discussion forum for additional insights.

📕 There are 2 new baselines for Music Demixing Track: KUIELab-Mdx-Net Edition & Demucs Edition Demucs Edition

🏛️ Watch the SDX23 Townhall & Presentations

🛠 How to debug your submissions

🕵️ Introduction

The previous edition of the MDX task focused on the basic formulation of music source separation into four instruments: the submitted systems were requested to separate a song into vocals, bass, drums, and others. This year, we extend the original formulation by requiring four instrument separation systems that are robust to specific and realistic issues in the training data.

We propose a challenge that tackles two types of such issues and sets up one leaderboard each. Additionally, we set up a third leaderboard, which is free from any constraints and relates to the standard source separation formulation for four instruments. This last leaderboard is similar to leaderboard B of the previous edition of the MDX challenge.

📜 The Track

The SDX23 Music Demixing Track features three leaderboards:

Label Noise
Bleeding
Standard Music Separation

Below we introduce them in detail.

Leaderboard A: Label Noise

When a song is produced, the performances of different instruments are recorded in a Digital Audio Workstation (DAW), the overall song is mixed and then exported as an individual file. In order to utilize a song as training data for supervised music source separation, recordings of instruments belonging to the same class (e.g., acoustic guitar and electric guitar, or drumset and percussions) are grouped together and mixed in a single track. Eventually, this leads to a common set of “stems”, each corresponding to one separation target.

For large amounts of data, this grouping is performed automatically based on metadata such as the name of the files (written by the music producer when the data was exported from the DAW). This metadata is sometimes incorrect and can lead to groupings of instruments that are inconsistent across different songs in the dataset. As a consequence, in the resulting training dataset the individual stems do not correctly represent the class they belong to, which can have dramatic consequences on the training of a neural network.

The systems submitted to this leaderboard should be robust to occasional errors in the instrument identities in the training data (e.g., a stem file is named “drums.wav”, but it contains both the drums recording and some synthesizer sounds). We refer to this type of errors as label noise.

📁 Dataset

For this leaderboard, we allow systems that are trained exclusively on the dataset that we provide, called SDXDB23_labelnoise.

It features 203 new songs, licensed by Moises, and follows the same format as MUSDB18.

This dataset has been artificially corrupted by Sony by changing some instrument identities (aka "label errors") in the stems. For example, there may be a stem called "vocals.wav" that actually contains the recording of the singer and some drum sounds. The frequency and kind of label errors we introduce are based on statistics collected from real datasets we use every day. The participants will not have access to any detail about where and how often the corruption happens.

We do not provide a split into training part and validation part: you are free to split it the way you prefer, as long as no other data is used. No other data can be used even in the validation stage.

The dataset is available for download over here.

The systems will be evaluated on the same hidden test set as in the previous edition of the challenge (MDXDB21). For a description of the test set, please refer to our publication. This dataset does not present any artificial label noise.

Leaderboard B: Bleeding

When producing music in a studio, the priority is given to the performance. Sometimes, this causes the recorded audio to contain sounds other than the instrument being recorded. For example, when recording a vocal performance, singers wear headphones in order to hear the rest of the band: the headphones might not insulate perfectly, and this causes the microphone to pick up the sound of the other instruments. Another example is the case of the bass guitar being played through an amplifier: low frequencies are difficult to isolate, and this sometimes causes the microphones on the drum set to pick up the bass melody, even if the bass amplifier is in a different room in the studio.

These phenomena corrupt the data we use for training and can impact the performance of the model. We refer to this type of errors as bleeding.

📁 Dataset

For this leaderboard, we allow systems that are trained exclusively on the dataset that we provide, called SDXDB23_bleeding.

The source material are the same 203 songs licensed by Moises that were used to create SDXDB23_labelnoise.

This dataset was artificially corrupted by Sony by simulating bleeding between the tracks. This means that each stem of a song might also be present partially into the other stems of the same song. The bleeding components in each stem were processed to make the bleeding simulation realistic. The participants will not have access to any detail about where, how often and how the corruption happens.

The dataset is available for download over here.

Leaderboard C: Standard Music Separation

This leaderboard has the same objective as the previous edition of the MDX challenge: the systems submitted here are “standard” music source separation models, with no requirement about robustness. They can be trained on any data the participants might have access to, with no limitation.

📁 Dataset

The systems submitted to this leaderboard can be trained on any data you might have. In case you're looking for something to start with, we recommend checking out the MUSDB18 dataset.

The systems will be evaluated on the same hidden test set as in the previous edition of the challenge (MDXDB21). For a description of the test set, please refer to our publication.

📕 Baselines

There are 2 new baselines for Music Demixing Track

📝 KUIELab-Mdx-Net Edition

This baseline contains:

How to train a vanilla KUIELab-MDX-Model for the challenge!
How to submit a trained KUIELab-MDX-Model to AICrowd's system

This repository does not cover:

How to make training of KUIELab-MDX-Net more robust on the corrupted dataset
Training Mixer, which was used in the previous challenge was omitted.

📔 Demucs Edition

This baseline contains:

Documentation on how to submit your models to the leaderboard
The procedure for best practices and information on how we evaluate your agent, etc.

💰 Prizes

🎻 Music Demixing Track (MDX) 32,000 USD

Leaderboard A: Label Noise: 10,000 USD

🥇 1st: 5000 USD
🥈 2nd: 3000 USD
🥉 3rd: 2000 USD

Leaderboard B: Bleeding: 10,000 USD

🥇 1st: 5000 USD
🥈 2nd: 3000 USD
🥉 3rd: 2000 USD

Leaderboard C: Standard Music Separation: 12,000 USD

🥇 1st: 5000 USD
🥈 2nd: 3000 USD
🥉 3rd: 2000 USD

Leaderboard C also includes a Bonus Prize of 2000 USD.

More details about the leaderboards and the prizes will be announced soon.

Please refer to the Challenge Rules for more details about the Open Sourcing criteria for each of the leaderboards to be eligible for the associated prizes.

💪 Getting Started

Make your first submission to the challenge using this easy-to-follow starter kit.

🚀 Baseline System

You can find a list of the baseline models in the starter kit. Please note that during the course of the challenge we will add more baselines, so stay tuned.

🖊 Evaluation Metric

As evaluation metric, we are using signal-to-distortion ratio (SDR), which is defined as

where S_stem(n) is the waveform of the ground truth and Ŝ_stem(𝑛) denotes the waveform of the estimate. The higher the SDR score, the better the output of the system is.

In order to rank systems, we will use the average SDR computed by

for each song. Finally, the overall score SDRtotal is given by the average over all songs in the hidden test set. There will be a separate leaderboard for each round.

For an academic report about the challenge, the organizers will get access to the separations of the top-10 submitted entries (i.e., their output) for each leaderboard in order to compute more source separation metrics (e.g., signal-to-interference ratio).

📅 Timeline

The SDX23 challenge takes place in 2 rounds, with an additional warm-up round:

Warmup Round: 8th December 2022
Phase I: 23rd January 2023
Phase II: 6th March 2023
Challenge End: 1st May 2023

📖 Citing the Challenge

If you are participating in this challenge and/or using the datasets involved consider citing the following paper:

Yuki Mitsufuji, Giorgio Fabbro, Stefan Uhlich, Fabian-Robert Stöter, Alexandre Défossez, Minseok Kim, Woosung Choi, Chin-Yun Yu, Kin-Wai Cheuk: Music Demixing Challenge 2021, Front. Signal Process., https://doi.org/10.3389/frsip.2021.808395

📱 Challenge Organising Committee

Music Demixing Track (MDX)

Yuki Mitsufuji, Giorgio Fabbro, Chieh-Hsin Lai, Woosung Choi, Marco Martinez Ramirez, Weihsiang Liao (Sony)
Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Igor Gadelha (Moises.AI)
Fabian-Robert Stöter (Audioshake)
Alexandre Défossez (Meta)

🏆 Challenge Sponsors

Getting Started

8

Reasons of submission failure About 3 years ago

23

5

What baselines are coming up? About 3 years ago

6

5

Structure of the competition Over 3 years ago

5