Official Round : Completed

LifeCLEF 2018 Bird - Soundscape


Bird sounds recognition in soundscapes recordings


Note: This challenge is one of the two subtasks of the LifeCLEF Bird identification challenge 2018. For more information about the other subtask click here . Both challenges share the same training dataset.

Challenge description

The goal of the task is to localize and identify all audible birds within the provided soundscape recordings. Each soundscape is divided into segments of 5 seconds, and a list of species associated to probability scores will have to be returned for each segment. Each prediction item (i.e. each line of the file) has to respect the following format: < MediaId;TC1-TC2;ClassId;probability> where probability is a real value in [0;1] decreasing with the confidence in the prediction, and where TC1-TC2 is a timecode interval with the format of hh:mm:ss with a length of 5 seconds (e.g.: 00:00:00-00:00:05, then 00:00:05-00:00:10).

Here is a short fake run example respecting this format on 3 segments of 5 seconds related to two MediaId: soundscape_fake_run

Each participating group is allowed to submit up to 4 runs built from different methods. Semi-supervised, interactive or crowdsourced approaches are allowed but will be compared independently from fully automatic methods. Any human assistance in the processing of the test queries has therefore to be signaled in the submitted runs.

Participants are allowed to use any of the provided metadata complementary to the audio content (.wav 44.1, 48 kHz or 96 kHz sampling rate), and will also be allowed to use any external training data but at the condition that (i) the experiment is entirely re-producible, i.e. that the used external resource is clearly referenced and accessible to any other research group in the world, (ii) participants submit at least one run without external training data so that we can study the contribution of such resources, (iii) the additional resource does not contain any of the test observations. It is in particular strictly forbidden to crawl training data from: www.xeno-canto.org


The training set contains 36,496 monophone recordings of the Xeno-Canto network covering 1500 species of central and south America (the largest bioacoustic dataset in the literature). It has a massive class imbalance with a minimum of four recordings for Laniocera rufescens and a maximum of 160 recordings for Henicorhina leucophrys. Recordings are associated to various metadata such as the type of sound (call, song, alarm, flight, etc.), the date, the location, textual comments of the authors, multilingual common names and collaborative quality ratings. Complementary to that data, a validation set of soundscapes with time-coded labels will be provided as training data. It contains about 20 minutes of soundscapes representing 240 segments of 5 seconds and with a total of 385 bird species annotations.

The test set itself will contain about 6 hours of soundscapes split in 4382 segments of 5 seconds (to be processed as separate queries). Some of them will be Stereophonic, offering possible sources separation to enhance the recognition.

Submission instructions

As soon as the submission is open, you will find a “Create Submission” button on this page (just next to the tabs)

Results (tables and figures)

(Official round during the LifeCLEF 2018 campaign)

Evaluation criteria

The used metric will be the classification mean Average Precision (c-mAP), considering each class c of the ground truth as a query. This means that for each class c, we will extract from the run file all predictions with ClassId=c, rank them by decreasing probability and compute the average precision for that class. We will then take the mean across all classes. More formally:


where C is the number of species in the ground truth and AveP(c) is the average precision for a given species c computed as:


where k is the rank of an item in the list of the predicted segments containing c, n is the total number of predicted segments containing c, P(k) is the precision at cut-off k in the list, rel(k) is an indicator function equaling 1 if the segment at rank k is a relevant one (i.e. is labeled as containing c in the ground truth) and nrel is the total number of relevant segments for c.


Contact us

We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :

  • Sharada Prasanna Mohanty: sharada.mohanty@epfl.ch
  • Hervé Glotin: glotin[AT]univ-tln[DOT]fr
  • Hervé Goëau: herve[DOT]goeau[AT]cirad[DOT]fr
  • Alexis Joly: alexis[DOT]joly[AT]inria[DOT]fr
  • Ivan Eggel: ivan[DOT]eggel[AT]hevs[DOT]ch

More information

You can find additional information on the challenge here: http://imageclef.org/node/230

Baseline Repository

You can find a baseline system and a continuative tutorial can be found here: https://github.com/kahst/BirdCLEF-Baseline

We encourage all participants of the challenge to build upon the provided code base and share the results for future reference.

Results (tables and figures)

(Official round during the LifeCLEF 2018 campaign)


Datasets License