Round 1: Completed

Round 2: Completed

WWW 2018 Challenge: Learning to Recognize Musical Genre

Hidden

Learning to Recognize Musical Genre from Audio on the Web

EPFL

24.5k

186

728

Like never before, the web has become a place for sharing creative work - such as music - among a global community of artists and art lovers. While music and music collections predate the web, the web enabled much larger scale collections. Whereas people used to own a handful of vinyls or CDs, they nowadays have instant access to the whole of published musical content via online platforms. Such dramatic increase in the size of music collections created two challenges: (i) the need to automatically organize a collection (as users and publishers cannot manage them manually anymore), and (ii) the need to automatically recommend new songs to a user knowing his listening habits. An underlying task in both those challenges is to be able to group songs in semantic categories.

Music genres are categories that have arisen through a complex interplay of cultures, artists, and market forces to characterize similarities between compositions and organize music collections. Yet, the boundaries between genres still remain fuzzy, making the problem of music genre recognition (MGR) a nontrivial task (Scaringella 2006). While its utility has been debated, mostly because of its ambiguity and cultural definition, it is widely used and understood by end-users who find it useful to discuss musical categories (McKay 2006). As such, it is one of the most researched areas in the Music Information Retrieval (MIR) field (Sturm 2012).

The task of this challenge, one of the four official challenges of the Web Conference (WWW2018) challenges track, is to recognize the musical genre of a piece of music of which only a recording is available. Genres are broad, e.g. pop or rock, and each song only has one target genre. The data for this challenge comes from the recently published FMA dataset (Defferrard 2017), which is a dump of the Free Music Archive (FMA), an interactive library of high-quality and curated audio which is freely and openly available to the public.

Results

You can find the final results and the ranking on the repository and in the slides used to announce them.

In the interest of reproducibility and transparency for interested researchers, you’ll find below links to the source code repositories of all systems submitted by the participants for the second round of the challenge.

Transfer Learning of Artist Group Factors to Musical Genre Classification
- Jaehun Kim (@jaehun), TU Delft and Minz Won (@minzwon), Universitat Pompeu Fabra
- Code: https://gitlab.crowdai.org/minzwon/WWWMusicalGenreRecognitionChallenge
- Paper: https://doi.org/10.1145/3184558.3191823
Ensemble of CNN-based Models using various Short-Term Input
- Hyungui Lim (@hglim), http://cochlear.ai
- Code: https://gitlab.crowdai.org/hglim/WWWMusicalGenreRecognitionChallenge
Detecting Music Genre Using Extreme Gradient Boosting
- Benjamin Murauer (@benjamin_murauer), Universität Innsbruck
- Code: https://gitlab.crowdai.org/Benjamin_Murauer/WWWMusicalGenreRecognitionChallenge
- Paper: https://doi.org/10.1145/3184558.3191822
ConvNet on STFT spectrograms
- Daniyar Chumbalov (@check), EPFL and Philipp Pushnyakov (@gg12), Moscow Institute of Physics and Technologies (MIPT)
- Code: https://gitlab.crowdai.org/gg12/WWWMusicalGenreRecognitionChallenge
Xception on mel-scaled spectrograms
- @viper and @algohunt
- Code: https://gitlab.crowdai.org/viper/WWWMusicalGenreRecognitionChallenge
Audio Dual Path Networks on mel-scaled spectrograms
- Sungkyun Chang (@mimbres), Seoul National University
- Code: https://gitlab.crowdai.org/mimbres/WWWMusicalGenreRecognitionChallenge

The repositories should be self-contained and easily executable. You can execute any of the systems on your own mp3s by following those steps:

You can find more details in the slides used to announce the results and in the overview paper. The overview paper summarizes our experience running a challenge with open data for musical genre recognition. Those notes motivate the task and the challenge design, show some statistics about the submissions, and present the results. Please cite our paper in your scholarly work if you want to reference this challenge.

@inproceedings{fma_crowdai_challenge,
  title = {Learning to Recognize Musical Genre from Audio},
  author = {Defferrard, Micha\"el and Mohanty, Sharada P. and Carroll, Sean F. and Salath\'e, Marcel},
  booktitle = {WWW '18 Companion: The 2018 Web Conference Companion},
  year = {2018},
  url = {https://arxiv.org/abs/1803.05337},
}

Evaluation criteria

To avoid overfitting and cheating, the challenge will happen in two rounds. The final ranking will be based on results from the second round. In the first round, participants are provided a test set of 35,000 clips of 30 seconds each, and they have to submit their predictions for all the 35,000 clips. The platform evaluates the predictions and ranks the participant upon submission. In the second round, all the participants will have to wrap their models in a Docker container. We will evaluate those against a new unseen test set. These 30s clips will be sampled (at least in part) from new contributions to the Free Music Archive.

Details of how to package your code as Binder compatible repositories, please read the documentation here : https://github.com/crowdAI/crowdai-musical-genre-recognition-starter-kit/blob/master/Round2_Packaging_Guidelines.md

The primary metric for evaluation will be the Mean Log Loss, and the secondary metric for the evaluation with be the Mean F1-Score.

The Mean Log Loss is defined by

$L = - \frac{1}{N} \sum_{n = 1}^{N} \sum_{c = 1}^{C} y_{n c} \ln (p_{n c}),$

where

$N = 35000$ is the number of examples in the test set,
$C = 16$ is the number of class labels, i.e. genres,
$y_{n c}$ is a binary value indicating if the n-th instance belongs to the c-th label,
$p_{n c}$ is the probability according to your submission that the n-th instance belongs to the c-th label,
$\ln$ is the natural logarithmic function.

The $F_{1}$ score for a particular class $c$ is given by

$F_{1}^{c} = 2 \frac{p^{c} r^{c}}{p^{c} + r^{c}},$

where

$p^{c} = \frac{t p^{c}}{t p^{c} + f p^{c}}$ is the precision for class $c$ ,
$r^{c} = \frac{t p^{c}}{t p^{c} + f n^{c}}$ is the recall for class $c$ ,
$t p^{c}$ refers to the number of True Positives for class $c$ ,
$f p^{c}$ refers to the number of False Positives for class $c$ ,
$f n^{c}$ refers to the number of False Negatives for class $c$ .

The final Mean $F_{1}$ Score is then defined as

$F_{1} = \frac{1}{C} \sum_{c = 1}^{C} F_{1}^{c} .$

The participants have to submit a CSV file with the following header:

file_id,Blues,Classical,Country,Easy Listening,Electronic,Experimental,Folk,Hip-Hop,Instrumental,International,Jazz,Old-Time / Historic,Pop,Rock,Soul-RnB,Spoken

Each row is then an entry for every file in the test set (in the sorted order of the file_ids). The first column in every row represents the file_id (which is the name of the test file without its .mp3 extension) and the rest of the $C = 16$ columns are the predicted probabilities for each class in the order mentioned in the above CSV header.

Resources

Please refer to the dataset page for more information about the training and test data, as well as download links.

The starter kit includes code to handle the data an make a submission. Moreover, it features some examples and baselines.

You are encouraged to check out the FMA dataset GitHub repository for Jupyter notebooks showing how to use the data, exploring it, and training baseline models. This challenge uses the rc1 version of the data, make sure to checkout that version of the code. The associated paper describes the data.

Additional resources:

Public contact channels:

We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at:

Prizes

The winner will be invited to present their solution to the 3rd Applied Machine Learning Days at EPFL in Switzerland in January 2019, with travel and accommodation covered (up to $2000).

Moreover, all participants are invited to submit a paper to the Web Conference (WWW2018) challenges track. The paper should describe the proposed solution and self-assessments of its performance. Papers must be submitted in PDF on EasyChair for peer-review. The template to use is ACM, selecting the “sigconf” sample (as for the main conference). Submissions should not exceed five pages including any diagrams or appendices, plus unlimited pages of references. As the challenge is run publicly, reviews are not double-blind and papers should not be anonymized. Accepted papers will be published in the official satellite proceedings of the conference. As the challenge will continue after the submission deadline, authors of accepted papers will have the opportunity to submit a camera-ready version which will incorporate their latest tweaks. The event at the conference will be like a workshop, where participants present their solutions and we announce the winners.

Timeline

Below is the timeline of the challenge:

2017-12-07 Challenge start.
2018-02-09 Paper submission deadline.
2018-02-14 Paper acceptance notification.
2018-03-01 End of the first round. No new participants can enroll.
2018-04-08 Participants have to submit a docker container for the second round.
2018-04-27 Announcement of winners and presentation of accepted papers at the conference.

Datasets License

Participants

Leaderboard

01	Unknown User	1.310
02	Unknown User	1.340
03	Unknown User	1.440
04	Unknown User	1.500
05	Unknown User	1.520