Official round: Completed

LifeCLEF 2020 Plant

USD 5K as part of Microsoft's AI for earth program Prize Money

1 Authorship/Co-Authorship

LifeCLEF

7963

120

Note: Do not forget to read the Rules section on this page. Pressing the red Participate button leads you to a page where you have to agree with those rules. You will not be able to submit any results before agreeing with the rules.

Note: Before trying to submit results, read the Submission instructions section on this page.

News

14/04: Due to the Covid-19 pandemic, the schedule of LifeCLEF will be shifted by about one month. The new deadline for the submission of runs by participants is 5 June 2020.

15/04: The test dataset is available in the Resources tab (once you joined the challenge)

Challenge description

The goal of the challenge is to identify plants in field pictures based on a training set of digitized herbarium specimens. Concretely, this will consist in a cross-domain classification task with a training set composed of digitized herbarium sheets and a test set composed of field pictures. To enable learning a mapping between the herbarium sheets domain and the field pictures domain, we will provide both herbarium sheets and field pictures for a subset of species.

Motivation

Despite recent progress in automated plant identification, a vast majority of the 300K+ plant species on earth can still not be recognized easily because of the lack of training data for that species. On the other side, for several centuries, botanists have collected, catalogued and systematically stored plant specimens in herbaria. These physical specimens are used to study the variability of species, their phylogenetic relationship, their evolution, or phenological trends. Millions of such specimens are now digitized and publicly available. Using them for training deep learning models is thus a very promising approach to help identifying data deficient species. However, their visual appearance is very different from field pictures which makes it a challenging cross-domain classification task.

We give below some papers that can be inspiring:

Adapting Visual Category Models to New Domains

VisDA: The Visual Domain Adaptation Challenge

CyCADA: Cycle-Consistent Adversarial Domain Adaptation

Few-Shot Adversarial Domain Adaptation

d-SNE: Domain Adaptation using Stochastic Neighborhood Embedding

Data

The challenge will rely on a large collection of more than 300K herbarium sheets coming from two sources: the Herbier IRD de Guyane”, CAY) digitized in the context of the e-ReColNat project, and iDigBio, a large international platform hosting millions of images of herbarium specimens. A valuable asset of this collection is that a few hundreds of herbarium sheets are accompanied by a few pictures of the same specimen in the field. The test set is composed of about 3K in-the-field pictures collected by two botanists specialist of the Amazonian flora.

A link to the the training dataset is available under the “Resources” tab.

Submission instructions

As soon as the submission is open, you will find a “Create Submission” button on this page (next to the tabs).

Before being allowed to submit your results, you have to first press the red participate button, which leads you to a page where you have to accept the challenges rules.

More practically, the run file to be submitted is a csv file (with semicolon separators) and has to contain as much lines as the number of predictions, each prediction being composed of an ObservationId (the identifier of a specimen that can be itself composed of several images), a ClassId, a Probability and a Rank (used in case of equal probabilities). Each line should have the following format: <ObservationId;ClassId;Probability;Rank>

Here is a short fake run example respecting this format for only 3 observations: fake_run

Participants will be allowed to submit a maximum of 10 run files.

Evaluation criteria:

The primary metrics used for the evaluation of the task will be the Mean Reciprocal Rank. The MRR is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer. The MRR is the average of the reciprocal ranks for the whole test set:

where |Q| is the total number of query occurrences in the test set.

A second metric will be again the MRR but computed on a subset of observations related to the less populated species in terms of photographies "in the field" based on the most comprehensive estimates possible from different data sources (IdigBio, GBIF, Encyclopedia of Life, Bing and Google Image search engines, previous datasets related to PlantCLEF and ExpertCLEF challenges).

External training data:

As a general comment, we can assume that classical ConvNet-based approaches using complementary training sets containing photos in the field such as ExpertCLEF2019 or GBIF, in addition to the PlantCLEF2020 training set, will perform well on the primary metric. However, we can assume that cross-domain approaches will get better results on the second metric where there is a lack of in-the-field training photos.

Since the supremacy of deep learning and transfer learning techniques, it is conceptually difficult to prohibit the use of external training data, notably the training data used during last year's ExperCLEF2019 challenge, or other pictures that can be met through the GBIF for example (please have a look to the pre-trained models and datasets generously shared by the CMP team http://ptak.felk.cvut.cz/personal/sulcmila/models/LifeCLEF2019/ - please cite the bibtex reference at the end of the related link if you plan to use it in your experiments).

However, despite all these comments about the use of external training data, we ask participants to provide at least one submission that uses only the training data provided this year.

Rules

LifeCLEF lab is part of the Conference and Labs of the Evaluation Forum: CLEF 2020. CLEF 2020 consists of independent peer-reviewed workshops on a broad range of challenges in the fields of multilingual and multimodal information access evaluation, and a set of benchmarking activities carried in various labs designed to test different aspects of mono and cross-language Information retrieval systems. More details about the conference can be found here.

Submitting a working note with the full description of the methods used in each run is mandatory. Any run that could not be reproduced thanks to its description in the working notes might be removed from the official publication of the results. Working notes are published within CEUR-WS proceedings, resulting in an assignment of an individual DOI (URN) and an indexing by many bibliography systems including DBLP. According to the CEUR-WS policies, a light review of the working notes will be conducted by LifeCLEF organizing committee to ensure quality. As an illustration, LifeCLEF 2019 working notes (task overviews and participant working notes) can be found within CLEF 2019 CEUR-WS proceedings.

Important

Participants of this challenge will automatically be registered at CLEF 2020. In order to be compliant with the CLEF registration requirements, please edit your profile by providing the following additional information:

First name

Last name

Affiliation

Address

City

Country

Regarding the username, please choose a name that represents your team.

This information will not be publicly visible and will be exclusively used to contact you and to send the registration data to CLEF, which is the main organizer of all CLEF labs

Citations

Information will be posted after the challenge ends.

Prizes

Cloud credit

The winner of each of the challenge will be offered a cloud credit grant of 5k USD as part of Microsoft’s AI for earth program.

Publication

LifeCLEF 2020 is an evaluation campaign that is being organized as part of the CLEF initiative labs. The campaign offers several research tasks that welcome participation from teams around the world. The results of the campaign appear in the working notes proceedings, published by CEUR Workshop Proceedings (CEUR-WS.org). Selected contributions among the participants, will be invited for publication in the following year in the Springer Lecture Notes in Computer Science (LNCS) together with the annual lab overviews.

Resources

Contact us

Discussion Forum

You can ask questions related to this challenge on the Discussion Forum. Before asking a new question please make sure that question has not been asked before.
Click on Discussion tab above or direct link: https://discourse.aicrowd.com/c/lifeclef-2020-plant

Alternative channels

We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :