AIcrowd | ImageCLEF 2022 Caption - Caption Prediction

Official Round: Completed

ImageCLEF

4633

107

Note: ImageCLEF Caption 2022 is divided into 2 subtasks (challenges). This is the Caption Prediction challenge. For information on the Concept Detection challenge click here. Both challenges' datasets are shared together, so registering for one of these challenges will automatically give you access to the other one.

Note: Do not forget to read the Rules section on this page. Pressing the red Participate button leads you to a page where you have to agree with those rules. You will not be able to submit any results before agreeing with the rules.

Note: Before trying to submit results, read the Submission instructions section on this page.

Challenge description

Interpreting and summarizing the insights gained from medical images such as radiology output is a time-consuming task that involves highly trained experts and often represents a bottleneck in clinical diagnosis pipelines.

Consequently, there is a considerable need for automatic methods that can approximate this mapping from visual information to condensed textual descriptions. The more image characteristics are known, the more structured are the radiology scans and hence, the more efficient are the radiologists regarding interpretation. We work on the basis of a large-scale collection of figures from open access biomedical journal articles (PubMed Central), as well as radiology images from original medical cases. All images in the training data are accompanied by UMLS concepts extracted from the original image caption.

Lessons learned:

In the first and second editions of this task, held at ImageCLEF 2017 and ImageCLEF 2018, participants noted a broad variety of content and situation among training images. In 2019, the training data was reduced solely to radiology images, with ImageCLEF 2020 adding additional imaging modality information, for pre-processing purposes and multi-modal approaches
The focus in ImageCLEF 2021 lay in using real radiology images annotated by medical doctors. This step aims at increasing the medical context relevance of the UMLS concepts
For ImageCLEF 2022, an extended version of the ImageCLEF 2020 dataset is used
To reduce the scope and size of concepts, several concept extraction tools are analyzed prior to caption pre-processing methods.
Concepts with less occurrence will be removed
As uncertainty regarding additional source was noted, we will clearly separate systems using exclusively the official training data from those that incorporate additional sources of evidence

On the basis of the concept vocabulary detected in the first subtask as well as the visual information of their interaction in the image, participating systems are tasked with composing coherent captions for the entirety of an image. In this step, rather than the mere coverage of visual concepts, detecting the interplay of visible elements is crucial for strong performance.

Evaluation of this second step is based on metrics such as BLEU that have been designed to be robust to variability in style and wording. In this year we will evaluate other potential metrics like METEOR, ROUGE, and CIDEr.

Data

As soon as the data are released they will be available under the "Resources" tab.

A subset of the extended Radiology Objects in COntext (ROCO) dataset, for this edition without imaging modality information, is used for both subtasks. As in previous editions, the dataset originates from biomedical articles of the PMC OpenAccess subset.

Training Set: Consists of 83,275 radiology images
Validation Set: Consists of 7,645 radiology images
Test Set: Consists of 7,601 radiology images

For this task each caption is pre-processed in the following way:

Numbers and words containing numbers were removed.
All punctuation was removed.
Lemmatization was applied using spaCy.
Captions were converted to lower-case.

Evaluation methodology

This year, in addition to the BLEU scores, ROUGE is used as a secondary metric. Other metrics like METEOR and CIDR will be reported after the challenge concludes. For evaluation, each caption will also be pre-processed (similar to the pre-processing steps mentioned above). The BLEU scores are calculated using the following methodology and parameters:

The default implementation of the Python NLTK (v3.2.2) (Natural Language ToolKit) BLEU scoring method is used. It is documented here and based on the original article describing the BLEU evaluation method
A Python (3.6) script loads the candidate run file, as well as the ground truth (GT) file, and processes each candidate-GT caption pair
Each caption is pre-processed in the following way:
- The caption is converted to lower-case
- All punctuation is removed an the caption is tokenized into its individual words
- Stopwords are removed using NLTK's "english" stopword list
- Lemmatization is applied using spacy's Lemmatizer
The BLEU score is then calculated. Note that the caption is always considered as a single sentence, even if it actually contains several sentences. No smoothing function is used.
All BLEU scores are summed and averaged over the number of captions, giving the final score.

NOTE : The source code of the evaluation tool is available here. It must be executed using Python 3.6.x, on a system where the NLTK (v3.2.2) Python library is installed. The script should be run like this:

/path/to/python3.6 evaluate-bleu.py /path/to/candidate/file /path/to/ground-truth/file

The ROUGE scores are calculated using the following methodology and parameters:

The native python implementation of ROUGE scoring method is used. It is designed to replicate results from the original perl package that was introduced in the original article describing the ROUGE evaluation method.
Specifically, we calculate the ROUGE-1 (F-measure) score, which measures the number of matching unigrams between the model-generated text and a reference.
A Python (3.7) script loads the candidate run file, as well as the ground truth (GT) file, and processes each candidate-GT caption pair
Each caption is pre-processed in the following way:
- The caption is converted to lower-case
- Stopwords are removed using NLTK's "english" stopword list
- Lemmatization is applied using spacy's Lemmatizer
The ROUGE score is then calculated. Note that the caption is always considered as a single sentence, even if it actually contains several sentences.
All ROUGE scores are summed and averaged over the number of captions, giving the final score.

Submission instructions

As soon as the submission is open, you will find a “Create Submission” button on this page (next to the tabs).

Before being allowed to submit your results, you have to first press the red participate button, which leads you to a page where you have to accept the challenge's rules.

Rules

Note: In order to participate in this challenge you have to sign an End User Agreement (EUA). You will find more information on the 'Resources' tab.

ImageCLEF lab is part of the Conference and Labs of the Evaluation Forum: CLEF 2022. CLEF 2022 consists of independent peer-reviewed workshops on a broad range of challenges in the fields of multilingual and multimodal information access evaluation, and a set of benchmarking activities carried in various labs designed to test different aspects of mono and cross-language Information retrieval systems. More details about the conference can be found here.

Submitting a working note with the full description of the methods used in each run is mandatory. Any run that could not be reproduced thanks to its description in the working notes might be removed from the official publication of the results. Working notes are published within CEUR-WS proceedings, resulting in an assignment of an individual DOI (URN) and an indexing by many bibliography systems including DBLP. According to the CEUR-WS policies, a light review of the working notes will be conducted by ImageCLEF organizing committee to ensure quality. As an illustration, ImageCLEF 2021 working notes (task overviews and participant working notes) can be found within CLEF 2021 CEUR-WS proceedings.

Important

Participants of this challenge will automatically be registered at CLEF 2022. In order to be compliant with the CLEF registration requirements, please edit your profile by providing the following additional information:

First name
Last name
Affiliation
Address
City
Country
Regarding the username, please choose a name that represents your team.

This information will not be publicly visible and will be exclusively used to contact you and to send the registration data to CLEF, which is the main organizer of all CLEF labs

Participating as an individual (non affiliated) researcher

We welcome individual researchers, i.e. not affiliated to any institution, to participate. We kindly ask you to provide us with a motivation letter containing the following information:

the presentation of your most relevant research activities related to the task/tasks
your motivation for participating in the task/tasks and how you want to exploit the results
a list of the most relevant 5 publications (if applicable)
the link to your personal webpage

The motivation letter should be directly concatenated to the End User Agreement document or sent as a PDF file to bionescu at imag dot pub dot ro. The request will be analyzed by the ImageCLEF organizing committee. We reserve the right to refuse any applicants whose experience in the field is too narrow, and would therefore most likely prevent them from being able to finish the task/tasks.

Citations

Information will be posted on https://www.imageclef.org/2022/medical/caption after the challenge ends.

Prizes

Publication

ImageCLEF 2022 is an evaluation campaign that is being organized as part of the CLEF initiative labs. The campaign offers several research tasks that welcome participation from teams around the world. The results of the campaign appear in the working notes proceedings, published by CEUR Workshop Proceedings (CEUR-WS.org). Selected contributions among the participants will be invited for publication in the following year in the Springer Lecture Notes in Computer Science (LNCS) together with the annual lab overviews.

Resources

Contact us

Discussion Forum

You can ask questions related to this challenge on the Discussion Forum. Before asking a new question please make sure that question has not been asked before.
Click on Discussion tab above or direct link: https://www.aicrowd.com/challenges/imageclef-2022-caption-caption-prediction/discussion

Alternative channels

We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :

johannes [dot] rueckert [at] fh-dortmund [dot] de
abenabacha [at] microsoft [dot] com
alba [dot] garcia [at] essex [dot] ac [dot] uk

More information

You can find additional information on the challenge here: https://www.imageclef.org/2022/medical/caption