Official Round: Completed

ImageCLEF 2018 Caption - Caption prediction

Composing coherent captions for the entirety of an image


Important note:

The ImageCLEF Caption - Caption Prediction challenge has officially ended and we would like to thank everybody for their participation. You can find the official results at http://imageclef.org/2018/caption.

Post-challenge submissions and the leaderboard will remain enabled for a few weeks so you will still be able to submit result files and have them continuously evaluated during a limited period. Please consider that in order to see the version of the leaderboard with the post-challenge submissions integrated, you have to turn on the switch Show post-challenge submission right below the leaderboard.

At the same time we’d like to encourage you to submit a CLEF Working notes paper until the end of May.

Please also note that participants registering from now on will not be automatically registered with CLEF anymore.

Note: ImageCLEF Caption 2018 is divided into 2 subtasks (challenges). This challenge is about Caption Prediction. For information on the Concept Detection challenge click here . Both challenges share the same dataset, so registering for one of these challenges will automatically give you access to the other one.

Note: Do not forget to read the Rules section on this page


Interpreting and summarizing the insights gained from medical images such as radiology output is a time-consuming task that involves highly trained experts and often represents a bottleneck in clinical diagnosis pipelines.

Consequently, there is a considerable need for automatic methods that can approximate this mapping from visual information to condensed textual descriptions. In this task, we cast the problem of image understanding as a cross-modality matching scenario in which visual content and textual descriptors need to be aligned and concise textual interpretations of medical images are generated. We work on the basis of a large-scale collection of figures from open access biomedical journal articles (PubMed Central). Each image is accompanied by its original caption, constituting a natural testbed for this image captioning task.

Lessons learned: In the first edition of this task, held at CLEF 2017, participants noted a broad topical variability among training images. This year, we will further group training data into image types (e.g., radiology vs. biopsy) and task participants will building either cross category models or category-specific ones. An additional source of uncertainty was noted in the use of external material. In this second edition of the task, we will clearly separate systems using exclusively the official training data from those that incorporate additional sources of evidence.

Challenge description

On the basis of the concept vocabulary detected in the first subtask as well as the visual information of their interaction in the image, participating systems are tasked with composing coherent captions for the entirety of an image. In this step, rather than the mere coverage of visual concepts, detecting the interplay of visible elements is crucial for strong performance. Evaluation of this second step is based on metrics such as BLEU that have been designed to be robust to variability in style and wording.


The collection comprises a total of 4 million image-caption pairs that could potentially all be used for training with a small subset being removed for testing. To focus on useful radiology/clinical images and non-compound figures is likely good for this task to reduce the number of image-caption pairs to around 400,000, so significantly larger that in 2017.

Submission instructions

As soon as the submission is open, you will find a “Create Submission” button on this page (just next to the tabs)

For the submission we expect the following format:

[Figure-ID] [TAB] [description]


1743-422X-4-12-1-4   description of the first image in one single line
1743-422X-4-12-1-3   description of the second image....
1743-422X-4-12-1-2   descrition of the third image...

You need to respect the following constraints:

  • The separator between the figure ID and the description has to be a tabular whitespace
  • Each figure ID of the testset must be included in the runfile exactly once
  • You should not include special characters in the description.


PubMed Central

Evaluation criteria

Evaluation is based on BLEU scores, using the following methodology and parameters:

  • The default implementation of the Python NLTK (v3.2.2) (Natural Language ToolKit) BLEU scoring method is used. It is documented here and based on the original article describing the BLEU evaluation method

  • A Python (3.6) script loads the candidate run file, as well as the ground truth (GT) file, and processes each candidate-GT caption pair

  • Each caption is pre-processed in the following way:

    • The caption is converted to lower-case

    • All punctuation is removed an the caption is tokenized into its individual words

    • Stopwords are removed using NLTK’s “english” stopword list

    • Stemming is applied using NLTK’s Snowball stemmer

  • The BLEU score is then calculated. Note that the caption is always considered as a single sentence, even if it actually contains several sentences. No smoothing function is used.

  • All BLEU scores are summed and averaged over the number of captions (10’000), giving the final score.

NOTE : The source code of the evaluation tool is available here . It must be executed using Python 3.6.x, on a system where the NLTK (v3.2.2) Python library is installed. The script should be run like this:

/path/to/python3.6 evaluate-bleu.py /path/to/candidate/file /path/to/ground-truth/file

The leaderboard will be visible from 01.05.2018 (official deadline) on. The submission system will remain open few more days. Results submitted after deadline will not be part of the official results.


Contact us

We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :

  • Sharada Prasanna Mohanty: sharada.mohanty@epfl.ch
  • Alba Garcia Seco de Herrera: alba[DOT]garcia[AT]essex[DOT]ac[DOT]uk
  • Henning Müller: henning[DOT]mueller[AT]hevs[DOT]ch
  • Vincent Adrearczyk: vincent[DOT]andrearczyk[AT]hevs[DOT]ch
  • Ivan Eggel: ivan[DOT]eggel[AT]hevs[DOT]ch

More information

You can find additional information on the challenge here: http://imageclef.org/2018/caption


ImageCLEF 2018 is an evaluation campaign that is being organized as part of the CLEF initiative labs. The campaign offers several research tasks that welcome participation from teams around the world. The results of the campaign appear in the working notes proceedings, published by CEUR Workshop Proceedings (CEUR-WS.org). Selected contributions among the participants, will be invited for publication in the following year in the Springer Lecture Notes in Computer Science (LNCS) together with the annual lab overviews.

Datasets License



01 Unknown User 0.250
02 Unknown User 0.234
03 Unknown User 0.228
04 Unknown User 0.227
05 Unknown User 0.224