Official Round: Completed

ImageCLEF 2022 Fusion - Result Diversification


Note: ImageCLEF Fusion includes 2 subtasks. This page is about the Result Diversification subtask. For information about the Media Interestingness subtask click here. Both challenges' datasets are shared together, so registering for one of these challenges will automatically give you access to the other one.

Note: Do not forget to read the Rules section on this page. Pressing the red Participate button leads you to a page where you have to agree with those rules. You will not be able to submit any results before agreeing with the rules.

Note: Before trying to submit results, read the Submission instructions section on this page.

Challenge description

While deep neural network methods have proven their predictive power in many tasks, there are still several domains where a single deep learning network is not enough for attaining high precision, e.g., prediction of subjective concepts such as violence, memorability, etc. Late fusion, also called ensembling or decision-level fusion, represents one of the approaches that researchers in machine learning employ to increase the performance of single-system approaches. It consists of using a series of weaker learner methods called inducers, whose prediction outputs are combined in the final step, via a fusion mechanism to create a new and improved super predictor. These systems have a long history and are shown to be particularly useful in scenarios where the performance of single-system approaches is not considered satisfactory [Ştefan2020, Constantin2021a, Constantin2022].

The ImageCLEFfusion 2022 task challenge participants to develop and benchmark late fusion schemes. The participants will receive a dataset of real inducers and are expected to provide a fusion mechanism that would allow to combine them into a super system yielding superior performance compared to the highest performing individual inducer system. The provided inducers were developed to solve two real tasks: (i) prediction of visual interestingness (for more information see [Constantin2021b]), and (ii) diversification of image search results (for more information see [Ionescu2020]). 

This task would allow to explore various aspects of late fusion mechanisms, such as the performance of different fusion methods, the methods for selecting inducers from a larger set, the exploitation of positive and negative correlations between inducers, and so on.


As soon as the data are released they will be available under the "Resources" tab.

The devset can be found in the "2022-ImageCLEFfusion-ResultDiversification-devset".

ImageCLEFfusion-div. The data for this task is extracted and corresponds to the Retrieving Diverse Social Images Task dataset [Ionescu2020]. We will provide outputs from 56 inducers, representing a total of 123 queries (also called topics), split into:

  • devset - representing the training data, composed of outputs from the 56 induceres for 60 topics
  • testset - representing the testing data, composed of outputs from the 56 inducers for 63 topics

This fusion task coresponds to a retrieval task, where participants are tasked with the problem of image search result diversification. Participants to this task must use the inducer system outputs we provide in order to create better, stronger search results. Results will be judged both for relevance and for their diversity, with two metrics being used - F1@20 and ClusterRecall@20.


We will provide the following folder for the data splits:

  • inducers - folder containing outputs from the 56 inducers for all the queries in the devset or testset in .txt format
  • scripts - folder containing useful scripts for calculating the metrics associated with this task, along with instructions on how to use them
  • topics - contains the topics file associated with the queries
  • gt - folder containing ground truth values for both diversity (dGT) and relevance (rGT) - only for devset
  • performance - the performance of the provided inducers given the ground truth data and the performance metrics - only for devset


The inducers folder contains outputs from the 56 inducers for all the queries in the devset or testset in .txt format. Filenames are created using the data split name (devset or testset) followed by the ID of the inducer (a number from 1 to 56). Each entry in this file contains the following fields:

  • query_id - represents the unique id of the query, which can be associated with the topics file
  • inter - ignored value
  • photo_id - the unique id of the photo represented by the entry
  • rank - is the photo rank in the refined list provided by your method. Rank is expected to be an integer value ranging from 0 (the highest rank) up to 49.
  • sim – is the similarity score of the photo to the query. The similarity values are higher for the photos ranked first and correspond to the refined ranking (e.g., the photo with rank 0 will have the highest sim value, followed by photo with rank 1 with the second highest sim value and so on). In case the inducers or even your own fusion approach does not provide explicitly similarity scores dummy similarity scores are created that decrease when the rank increases (e.g., in this case, you may use the inverse ranking values).
  • run_name - a general name for the inducer 

An example is presented below:

1 0 3338743092 0 0.94 run_inducer1
1 0 3661411441 1 0.9 run_inducer1
1 0 7112511985 48 0.2 run_inducer1
1 0 711353192 49 0.12 run_inducer1
2 0 233474104 0 0.84 run_inducer1
2 0 3621431440 1 0.7 run_inducer1

This will also represent the format you will test your methods with.


Contains div_eval.jar, a java file used for computing the metrics.  The software tool was developed under Java and to run it you need to have Java installed on your machine. To check, you may run the following line in a command window: "java -version". To run the script, use the following syntax (make sure you have the div_eval.jar file in your current folder):

java -jar div_eval.jar -r <runfilepath> -rgt <rGT directory path> -dgt <dGT directory path> -t <topic file path> -o <output file directory> [optional: -f <output file name>]

With this script you can see the performance of the inducers or of your own fusion method. While the script in itself displays many metrics at different cutoff values, we will only take into account two metrics for this competition, namely:

  •  F1@20 - the main metric
  • ClusterRecall@20 - the secondary metric

The output of running this script is presented below, and the two metrics are presented in bold.


"Run name","RUNd2.txt"


"Average P@20 = ",.784

"Average CR@20 = ",.4278

"Average F1@20 = ",.5432


"Query Id ","Location name",P@5,P@10,P@20,P@30,P@40,P@50,CR@5,CR@10,CR@20, CR@30,CR@40,CR@50,F1@5,F1@10,F1@20,F1@30,F1@40,F1@50

1,"Aachen Cathedral",.8,.9,.95,.9667,.95,.94,.1333,.4,.5333,.7333,.8667, .9333,.2286,.5538,.6831,.834,.9064,.9367

2,"Angel of the North",1.0,.9,.95,.9333,.925,.94,.2667,.5333,.8,.8667,.8667, .9333,.4211,.6698,.8686,.8988,.8949,.9367


25,"Ernest Hemingway House",.8,.7,.5,.5667,.55,.6,.2353,.4118,.5294,.6471,.7647, .8824,.3636,.5185,.5143,.6042,.6398,.7143


"--","Avg.",P@5,P@10,P@20,P@30,P@40,P@50,CR@5,CR@10,CR@20,CR@30, CR@40,CR@50,F1@5,F1@10,F1@20,F1@30,F1@40,F1@50

,,.76,.784,.792,.784,.789,.7944,.2577,.4278,.6343,.7443,.8504,.8919,.376, .5432,.696,.757,.813,.834


Contains an .xml file that describes the topics. Each topic is delimited by a <topic> </topic> statement and includes the query_id code (to be used for interpreting inducer outputs and for preparing the official runs - delimited by a <number> </number> statement), the query title (delimited by a <title> </title> statement), the GPS coordinates (latitude and longitude in degrees) and the url to the Wikipedia webpage (delimited by a <wiki> </wiki> statement). An example is presented below:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


Contains two folders - dGT associated with the diversity ground truth and rGT associated with the relevance ground truth.

Relevance ground truth was annotated using a dedicated tool that provided the annotators with one photo at a time. A reference photo of the location could be also displayed during the process. Annotators were asked to classify the photos as being relevant (score 1), non-relevant (score 0) or with “don’t know” answer (score -1). The definition of relevance was available to the annotators in the interface during the entire process. The annotation process was not time restricted. Annotators were recommended to consult any additional information source about the characteristics of the location (e.g., from Internet) in case they were unsure about the annotation. Ground truth was collected from several annotators and final ground truth was determined after a lenient majority voting scheme.


Diversity ground truth was also annotated with a dedicated tool. The diversity is annotated only for the photos that were judged as relevant in the previous step. For each query, annotators were provided with a thumbnail list of all the relevant photos. The first step required annotators to get familiar with the photos by analyzing them for about 5 minutes. Next, annotators were required to re-group the photos into similar visual appearance clusters. Full size versions of the photos were available by clicking on the photos. The definition of diversity was available to the annotators in the interface during the entire process. For each of the clusters, annotators provided some keyword tags reflecting their judgments in choosing these particular clusters. Similar to the relevance annotation, the diversity annotation process was not time restricted. In this particular case, ground truth was collected from several annotators that annotated distinct parts of the data set. Ground truth is provided to participants on a per query basis. We provide two individual txt files for each query: one file for the cluster ground truth and one file for the photo diversity ground truth. Files are named according to the query title followed by the ground truth code, e.g., “abbey_of_saint_gall dclusterGT.txt” and “abbey_of_saint_gall dGT.txt” refer to the cluster ground truth (dclusterGT) and photo diversity ground truth (dGT) for the query location abbey_of_saint_gall.

In the dclusterGT file each line corresponds to a cluster where the first value is the cluster id number followed by the cluster user tag separated by comma. Lines are separated by an end-of-line character (carriage return). An example is presented below:

1,outside statue
2,inside views
3,partial frontal view

In the dGT file the first value on each line is the unique photo id followed by the cluster id number (that corresponds to the values in the dclusterGT file) separated by comma. Each line corresponds to the ground truth of one image and lines are separated by an end-of-line character (carriage return). An example is presented below:



The performance folder contains the performance metrics for all the 56 inducers, in the same format as the output of the div_eval.jar script.

Submission instructions

As soon as the submission is open, you will find a “Create Submission” button on this page (next to the tabs).

Before being allowed to submit your results, you have to first press the red participate button, which leads you to a page where you have to accept the challenge's rules.

The submission file has the following columns, all separated by a space character (the file extension does not matter):

  • query_id - integer value between 91 and 153 - represents the unique id of the query, which can be associated with the topics file
  • iter - integer value - not used value is 0
  • photo_id - large number - the unique id of the photo represented by the entry (example: 3534859020)
  • rank - integer starting with 0 - is the photo rank in the refined list (example: 15)
  • sim – floating value - is the similarity score of the photo to the query (example: 0.15)
  • run_name - a general name for the submission (example: submission_run1.txt)

A short example of this would be:

91 0 109209656 0 1 run1_test
91 0 110113565 1 0.33 run1_test
91 0 110113677 2 0.11 run1_test

When submitting your files, the following rules must be respected:

  • all the 6 tokens previously mentioned (query_id, iter, photo_id, ...) must be present for every entry in the submission file
  • query_id must only have values associated with the testset (91, 92, ..., 153) and photo_id must also be the ones associated with the testset
  • please respect the limits and requested values previously indicated (iter must always be 0, rank must start at 0, etc)
  • pairs of (query_id, photo_id) combinations must be unique (e.g., no two entries for (91 109209656))
  • keep the same string for run_name throughout a submission file
  • do not leave out any query_id associated with the testset from your submission

Finally, when submitting your runs, please note that the scores (metrics) are not automatically calculated - therefore the system will display a score of 0.00. We will download the submissions manually after you submit them, calculate your scores, and update the website and communicate the results to you.

  • Rules

Note: In order to participate in this challenge you have to sign an End User Agreement (EUA). You will find more information on the 'Resources' tab.

ImageCLEF lab is part of the Conference and Labs of the Evaluation Forum: CLEF 2022. CLEF 2022 consists of independent peer-reviewed workshops on a broad range of challenges in the fields of multilingual and multimodal information access evaluation, and a set of benchmarking activities carried in various labs designed to test different aspects of mono and cross-language Information retrieval systems. More details about the conference can be found here.

Submitting a working note with the full description of the methods used in each run is mandatory. Any run that could not be reproduced thanks to its description in the working notes might be removed from the official publication of the results. Working notes are published within CEUR-WS proceedings, resulting in an assignment of an individual DOI (URN) and an indexing by many bibliography systems including DBLP. According to the CEUR-WS policies, a light review of the working notes will be conducted by ImageCLEF organizing committee to ensure quality. As an illustration, ImageCLEF 2021 working notes (task overviews and participant working notes) can be found within CLEF 2021 CEUR-WS proceedings.


Participants of this challenge will automatically be registered at CLEF 2022. In order to be compliant with the CLEF registration requirements, please edit your profile by providing the following additional information:

  • First name

  • Last name

  • Affiliation

  • Address

  • City

  • Country

  • Regarding the username, please choose a name that represents your team.

This information will not be publicly visible and will be exclusively used to contact you and to send the registration data to CLEF, which is the main organizer of all CLEF labs

Participating as an individual (non affiliated) researcher

We welcome individual researchers, i.e. not affiliated to any institution, to participate. We kindly ask you to provide us with a motivation letter containing the following information:

  • the presentation of your most relevant research activities related to the task/tasks

  • your motivation for participating in the task/tasks and how you want to exploit the results

  • a list of the most relevant 5 publications (if applicable)

  • the link to your personal webpage

The motivation letter should be directly concatenated to the End User Agreement document or sent as a PDF file to bionescu at imag dot pub dot ro. The request will be analyzed by the ImageCLEF organizing committee. We reserve the right to refuse any applicants whose experience in the field is too narrow, and would therefore most likely prevent them from being able to finish the task/tasks.


[Ionescu2020] Ionescu, B., Rohm, M., Boteanu, B., Gînscă, A. L., Lupu, M., & Müller, H. (2020). Benchmarking Image Retrieval Diversification Techniques for Social Media. IEEE Transactions on Multimedia, 23, 677-691.

[Ştefan2020] Ştefan, L. D., Constantin, M. G., & Ionescu, B. (2020, June). System Fusion with Deep Ensembles. In Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 256-260).

[Constantin2021a] Constantin, M. G., Ştefan, L. D., & Ionescu, B. (2021, June). DeepFusion: Deep Ensembles for Domain Independent System Fusion. In the International Conference on Multimedia Modeling (pp. 240-252). Springer, Cham.

[Constantin2021b] Constantin, M. G., Ştefan, L. D., Ionescu, B., Duong, N. Q., Demarty, C. H., & Sjöberg, M. (2021). Visual Interestingness Prediction: A Benchmark Framework and Literature Review. International Journal of Computer Vision, 1-25.

[Constantin2022] Constantin, M. G., Ştefan, L. D., & Ionescu, B. (2022). Exploring Deep Fusion Ensembling for Automatic Visual Interestingness Prediction. In Human Perception of Visual Information (pp. 33-58). Springer, Cham



ImageCLEF 2022 is an evaluation campaign that is being organized as part of the CLEF initiative labs. The campaign offers several research tasks that welcome participation from teams around the world. The results of the campaign appear in the working notes proceedings, published by CEUR Workshop Proceedings (CEUR-WS.org). Selected contributions among the participants will be invited for publication in the following year in the Springer Lecture Notes in Computer Science (LNCS) together with the annual lab overviews.


Contact us

Discussion Forum

Alternative channels

We strongly encourage you to use the public channels mentioned above for communications between the participants and the organizers. In extreme cases, if there are any queries or comments that you would like to make using a private communication channel, then you can send us an email at :

  • liviu_daniel [dot] stefan [at] upb [dot] ro
  • mihai [dot] constantin84 [at] upb [dot] ro
  • dogariu [dot] mihai8 [at] gmail [dot] com

More information

You can find additional information on the challenge here: https://www.imageclef.org/2022/fusion