LifeCLEF 2021 Plant
Note: Do not forget to read the Rules section on this page. Pressing the red Participate button leads you to a page where you have to agree with those rules. You will not be able to submit any results before agreeing with the rules.
Note: Before trying to submit results, read the Submission instructions section on this page.
08/04: The test dataset is available in the Resources tab (once you joined the challenge)
The goal of the challenge is to identify plants in field pictures based on a training set of digitized herbarium specimens. Concretely, this will consist in a cross-domain classification task with a training set composed of digitized herbarium sheets and a test set composed of field pictures. To enable learning a mapping between the herbarium sheets domain and the field pictures domain, we will provide both herbarium sheets and field pictures for a subset of species.
Despite recent progress in automated plant identification, a vast majority of the 300K+ plant species on earth can still not be recognized easily because of the lack of training data for that species. On the other side, for several centuries, botanists have collected, catalogued and systematically stored plant specimens in herbaria. These physical specimens are used to study the variability of species, their phylogenetic relationship, their evolution, or phenological trends. Millions of such specimens are now digitized and publicly available. Using them for training deep learning models is thus a very promising approach to help identifying data deficient species. However, their visual appearance is very different from field pictures which makes it a challenging cross-domain classification task.
We give below some papers that can be inspiring including the working notes written by the participants of the last edition of PlantCLEF:
The challenge relies on a large collection of more than 300K herbarium sheets coming from two sources: the Herbier IRD de Guyane”, CAY) digitized in the context of the e-ReColNat project, and iDigBio, a large international platform hosting millions of images of herbarium specimens. A valuable asset of this collection is that a few hundreds of herbarium sheets are accompanied by a few pictures of the same specimen in the field.
The training dataset of the LifeCLEF2021 Plant Identification challenge is based on the same visual data used during the previous LifeCLEF 2020 Plant Identification challenge but also introduces new data related to 5 "traits" covering exhaustively all the 1000 species of the challenge. Traits are a very valuable information that can potentially help improve prediction models. Indeed, it can be assumed that species which share the same traits also share partially similar visual appearances. This information can then potentially be used to guide the training of a model through auxiliary loss functions for instance. The traits were collected through the Encyclopedia of Life API. The 5 most exhaustive traits ("plant growth form", "habitat", "plant lifeform", "trophic guild" and "woodiness") were verified and completed by experts of the Guyanese flora, so that each of the 1000 species have a value for each trait.
Finally, the test set is composed exclusively of in-the-field pictures (about 3K) collected by two botanists and experts of the Amazonian flora.
A link to the the training dataset is now available under the “Resources” tab.
As soon as the submission is open, you will find a “Create Submission” button on this page (next to the tabs).
Before being allowed to submit your results, you have to first press the red participate button, which leads you to a page where you have to accept the challenge's rules.
More practically, the run file to be submitted is a csv file (with semicolon separators) and has to contain as much lines as the number of predictions, each prediction being composed of an ObservationId (the identifier of a specimen that can be itself composed of several images), a ClassId, a Probability and a Rank (used in case of equal probabilities). Each line should have the following format: <ObservationId;ClassId;Probability;Rank>
Here is a short fake run example respecting this format for only 3 observations: fake_run
Participants will be allowed to submit a maximum of 10 run files.
The primary metrics used for the evaluation of the task will be the Mean Reciprocal Rank. The MRR is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer. The MRR is the average of the reciprocal ranks for the whole test set:
where |Q| is the total number of query occurrences in the test set.
A second metric will be again the MRR but computed on a subset of observations related to the less populated species in terms of photographies "in the field" based on the most comprehensive estimates possible from different data sources (IdigBio, GBIF, Encyclopedia of Life, Bing and Google Image search engines, previous datasets related to PlantCLEF and ExpertCLEF challenges).
External training data:
As a general comment, we can assume that classical ConvNet-based approaches using complementary training sets containing photos in the field such as ExpertCLEF2019 or GBIF, in addition to the PlantCLEF2020 training set, will perform well on the primary metric. However, we can assume that cross-domain approaches will get better results on the second metric where there is a lack of in-the-field training photos.
Since the supremacy of deep learning and transfer learning techniques, it is conceptually difficult to prohibit the use of external training data, notably the training data used during last year's ExperCLEF2019 challenge, or other pictures that can be met through the GBIF for example (please have a look to the pre-trained models and datasets generously shared by the CMP team http://ptak.felk.cvut.cz/personal/sulcmila/models/LifeCLEF2019/ - please cite the bibtex reference at the end of the related link if you plan to use it in your experiments).
However, despite all these comments about the use of external training data, we ask participants to provide at least one submission that uses only the training data provided this year.
LifeCLEF lab is part of the Conference and Labs of the Evaluation Forum: CLEF 2021. CLEF 2021 consists of independent peer-reviewed workshops on a broad range of challenges in the fields of multilingual and multimodal information access evaluation, and a set of benchmarking activities carried in various labs designed to test different aspects of mono and cross-language Information retrieval systems. More details about the conference can be found here.
Submitting a working note with the full description of the methods used in each run is mandatory. Any run that could not be reproduced thanks to its description in the working notes might be removed from the official publication of the results. Working notes are published within CEUR-WS proceedings, resulting in an assignment of an individual DOI (URN) and an indexing by many bibliography systems including DBLP. According to the CEUR-WS policies, a light review of the working notes will be conducted by LifeCLEF organizing committee to ensure quality. As an illustration, LifeCLEF 2020 working notes (task overviews and participant working notes) can be found within CLEF 2020 CEUR-WS proceedings.
Participants of this challenge will automatically be registered at CLEF 2021. In order to be compliant with the CLEF registration requirements, please edit your profile by providing the following additional information:
Regarding the username, please choose a name that represents your team.
This information will not be publicly visible and will be exclusively used to contact you and to send the registration data to CLEF, which is the main organizer of all CLEF labs
Information will be posted after the challenge ends.
The winner of each of the challenge will be offered a cloud credit grant of 5k USD as part of Microsoft’s AI for earth program.
LifeCLEF 2021 is an evaluation campaign that is being organized as part of the CLEF initiative labs. The campaign offers several research tasks that welcome participation from teams around the world. The results of the campaign appear in the working notes proceedings, published by CEUR Workshop Proceedings (CEUR-WS.org). Selected contributions among the participants, will be invited for publication in the following year in the Springer Lecture Notes in Computer Science (LNCS) together with the annual lab overviews.
- You can ask questions related to this challenge on the Discussion Forum. Before asking a new question please make sure that question has not been asked before.
- Click on Discussion tab above or direct link: https://www.aicrowd.com/challenges/lifeclef-2021-plant/discussion
We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :
You can find additional information on the challenge here: https://www.imageclef.org/PlantCLEF2021