AIcrowd | Sound Demixing Challenge 2023

Warm-Up Round: Completed

Phase I: Completed

Phase 2: Completed

AIcrowd &

Sony Group Corporation &

Moises.AI &

Mitsubishi Electric Research Laboratories

108.9k

1201

2863

Problem Statements

Weight: 1.0

Cinematic Sound Demixing Track - CDX'23

Source separation of a cinematic audio track into dialogue, sound-effects and misc.

14k

832

Weight: 1.0

Music Demixing Track - MDX'23

Music source separation of an audio signal into separate tracks for vocals, bass, drums, and other

30.8k

2031

🏆 Winner's Solutions

🔍 Discover released models and source code in our MDX track and CDX track papers' "Notes" section.

🗣️ Explore teams' model announcements on the discussion forum for additional insights.

🏛️ Watch the SDX23 Townhall & Presentations

📹 Watch the baseline walkthrough video

💬 Share your feedback and suggestions with us over here.

Welcome to the Sound DemiXing (SDX) challenge! 🎼

This challenge is an opportunity for researchers and machine learning enthusiasts to test their skills on the difficult task of audio source separation: given an audio signal as input (referred to as “mixture”), the challenge is to decompose it into its individual parts.

🕵️ Introduction

Have you ever sung using a karaoke machine or made a DJ music mix of your favourite song? Have you wondered how hearing aids help people listen more clearly or how video conference software reduces background noise?

They all use the magic of audio separation.

Music source separation (MSS) attracts professional music creators as it enables remixing and revising songs in a way traditional equalisers don't. Suppressed vocals in songs can improve your karaoke night and provide a richer audio experience than conventional applications.

Audio source separation has different delineations, depending on the kind of signal the system is working on. Music source separation systems take a song as input and output one track for each of the instruments. Cinematic sound separation systems take movie audio as input and separate it into dialogue, sound effects, and music. Speech enhancement systems separate speech signals from background noise.

NOTE: This image will be updated to align with the visual theme of the whole challenge.

In 2021, Sony followed the long tradition of the SiSEC MUS challenges by organizing the Music DemiXing (MDX) challenge. Participants have submitted systems that separate a song into four instruments: vocals, bass, drums, and other (the instrument class “other” contains signals of all instruments other than the first three, e.g., guitar or piano).

This year, as a follow-up to the MDX challenge, organizers from 5 companies (Sony, Moises.AI, Mitsubishi Electric Research Labs, AudioShake, and Meta) join forces to organize a larger competition that goes beyond music source separation: the Sound DemiXing (SDX) challenge.

🎸 The Sound Demixing Challenge 2023

The Sound DemiXing challenge hosts one track on music source separation (MDX) and one track on cinematic sound separation (CDX). Independent leaderboards are set for the two tracks, each featuring an independent prize pool.

The challenge is now live, you can find all the details below.

🎻 Music Demixing Track (MDX)

Audio source separation has always been applied to music: karaoke systems can benefit from this technology as users can sing over any original song, where the vocals have been suppressed.

Grammy award winning producers and artists like Jordan Rudess (Dream Theater) are using Moises to produce, master and perform music in novel ways.

The previous edition of the MDX challenge focused on the basic formulation of music source separation into four instruments: the submitted systems were requested to separate a song into vocals, bass, drums, and others. This year, we extend the original formulation by requiring four instrument separation systems that are robust to specific and realistic issues in the training data.

We propose a challenge that tackles two types of such issues and set up one leaderboard each. Additionally, we set up a third leaderboard, which is free from any constraints and relates to the standard source separation formulation for four instruments. This last leaderboard is similar to leaderboard B of the previous edition of the MDX challenge.

More details about the leaderboards are available in the Music Track page.

🥁 Cinematic Sound Demixing Track (CDX)

The SDX23 challenge introduces a novel formulation of audio source separation in the competition: cinematic sound separation (CDX). This is the task of separating the audio of a movie into three tracks: dialogue, sound effects and music. CDX has many applications, ranging from language dubbing to upmixing of old movies to spatial audio and user interfaces for flexible listening.

For example, the original master track of old movies contains all the material (dialogue, music and sound effects) mixed in mono or stereo: thanks to source separation, we can retrieve the individual components and allow for up-mixing to surround systems. Sony has already restored many movies with this technology in their Columbia Classics collection.

MERL has a long history in sound separation research, from pioneering work in latent variable models to deep clustering for overlapping speech separation, and recently published work on soundtrack separation (https://cocktail-fork.github.io/) along with releasing a synthetic dataset to help foster research on this topic.

The CDX track is similar to the previous edition of the Music DemiXing challenge: leaderboard A imposes some constraints on the training data (by allowing only the use of the "Divide-and-Remaster" dataset), while leaderboard B allows the use of any training data. The submitted systems will be evaluated on a new hidden test set of real audio from movies by Sony Picture Entertainment.

More details about the leaderboards and the hidden test set are available in the Cinematic Track page.

💰 Prizes

The prize pool is a total of 42,000 USD divided among the two tracks. Participating teams are eligible to win prizes in multiple leaderboards spread across both the tracks.

🎻 Music Demixing Track (MDX) 32,000 USD

Leaderboard A: 10,000 USD

🥇 1st: 5000 USD
🥈 2nd: 3000 USD
🥉 3rd: 2000 USD

Leaderboard B: 10,000 USD

🥇 1st: 5000 USD
🥈 2nd: 3000 USD
🥉 3rd: 2000 USD

Leaderboard C: 12,000 USD

🥇 1st: 5000 USD
🥈 2nd: 3000 USD
🥉 3rd: 2000 USD

Leaderboard C also includes a Bonus Prize of 2000 USD.

Please refer to the Challenge Rules for more details about the Open Sourcing criteria for each of the leaderboards to be eligible for the associated prizes.

🥁 Cinematic Sound Demixing Track (CDX) 10,000 USD

Leaderboard A - Divide and Remaster (DnR) : 5,000 USD

🥇 1st: 2500 USD
🥈 2nd: 1500 USD
🥉 3rd: 1000 USD

Leaderboard B - Open Track: 5,000 USD

🥇 1st: 2500 USD
🥈 2nd: 1500 USD
🥉 3rd: 1000 USD

Please refer to the Challenge Rules for more details about the Open Sourcing criteria for each of the leaderboards to be eligible for the associated prizes.

💪 Getting Started

Make your first submission for the challenge using this easy-to-follow starter kit.

🚀 Baseline System

The challenge features a set of baseline systems that you can use either as a starting point for your model, or simply as a comparison to your network. You can find all baseline systems in the starter kit.

Please note that, while the challenge is ongoing, we will share more baselines and resources, so stay tuned.

🖊 Evaluation Metric

As evaluation metric, we are using signal-to-distortion ratio (SDR), which is defined as

where S_stem(n) is the waveform of the ground truth and Ŝ_stem(𝑛) denotes the waveform of the estimate. The higher the SDR score, the better the output of the system is.

In order to rank systems, we will use the average SDR computed by

for each song. Finally, the overall score SDRtotal is given by the average over all songs in the hidden test set. There will be a separate leaderboard for each round.

Please note that the organizers will not get access to the submitted entries – everything is handled by AIcrowd and AIcrowd guarantees for the security of your submissions. Nevertheless, the organizers plan to write an academic paper and for this will get access to the output (i.e., the separated signals) of the top-10 entries for each leaderboard. For more information, please see the challenge rules.

📅 Timeline

The SDX23 challenge takes place in 2 rounds, with an additional warm-up round:

Warmup Round: 8th December 2022
Phase I: 23rd January 2023
Phase II: 6th March 2023
Challenge End: 1st May 2023

📖 Citing the Challenge

If you are participating in this challenge and/or using the datasets involved consider citing the following paper:

Yuki Mitsufuji, Giorgio Fabbro, Stefan Uhlich, Fabian-Robert Stöter, Alexandre Défossez, Minseok Kim, Woosung Choi, Chin-Yun Yu, Kin-Wai Cheuk: Music Demixing Challenge 2021, Front. Signal Process., https://doi.org/10.3389/frsip.2021.808395

📱 Challenge Organising Committee

Music Demixing Track (MDX)

Yuki Mitsufuji, Giorgio Fabbro, Chieh-Hsin Lai, Woosung Choi, Marco Martinez Ramirez, Weihsiang Liao (Sony)
Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Igor Gadelha (Moises.AI)
Fabian-Robert Stöter (Audioshake)
Alexandre Défossez (Meta)

Cinematic Sound Demixing Track (CDX)