Round 1: Completed Ablation round: Completed Post challenge submissions: 7712 days left

ECCV 2020 Commands 4 Autonomous Vehicles

1 x € 1500, 1 x € 500, 1 x € 250 Prize Money
3 Authorship/Co-Authorship



The Commands For Autonomous Vehicles (C4AV) workshop aims to develop state-of-the-art models that can tackle the joint understanding of vision and language under a real-word task setting. In order to stimulate research in this area, we are hosting a challenge that considers a visual grounding task in a self-driving car setting. More specifically, a passenger gives a natural language command to express an action that needs to be taken by the autonomous vehicle. Every command refers to an object visible from the front of the vehicle. The model is tasked with predicting the coordinates of a 2D bounding box around the referred object. Some examples can be found below. For every command, the referred object is indicated in bold. For every image, the referred object from the command is indicated with a red bounding box. Participants are required to predict the 2D projected coordinates of the red bounding boxes.


Turn around and park in front of that vehicle in the shade.


You can park up ahead behind the silver car, next to that lamp post with the orange sign on it.



  • The challenge is based on the recent Talk2Car dataset (EMNLP19') which builds on top of the nuScenes dataset for urban scene understanding. Participans are allowed to use the extra sensor modalities provided by nuScenes (e.g. LIDAR, RADAR, MAPS). Note that some of our images are part of the nuScenes train set, therefore participants should not directly use models that were pre-trained on the regular train-val split from nuScenes.
  • (Simplified) We provide a simplified version of our dataset here. This version only contains the RGB images, the commands, and region proposals obtained with a CenterNet model. Participants who want to tackle the problem based on RGB input alone can use this simplified version of the dataset. A dataloader is provided in Pytorch.


Baseline / Quick-Start Model

We provide an easy-to-follow code repository to help participants get started in the challenge. A model is trained for the object referral task by matching an embedding of the command with object region proposals. The code was designed to be easily understandable and adaptable.

Other baselines

There are also two other baseline models included in the leaderboard.

The first baseline, MAC, is a multi-step reasoning model that reasons both over text and image. More information about this model can be found https://arxiv.org/abs/1803.03067 and https://arxiv.org/abs/1909.10838.

The second baseline, MSRR, is a multi-step reasoning model that reasons both over text and image but also over objects. More information about this model can be found here https://arxiv.org/abs/2003.08717.


Submissions are evaluated based on the rate at which the IoU of the predicted and ground-truth bounding box exceeds 50% (AP50). Notice that the evaluation is performed using the 2D projected bounding boxes. In order to provide additional insights into the failure cases of the model, participants can also see the score of the model on carefully selected subsets. These are characterized by the command length, the number of instances of the referred object's class in the scene, and the distance of the referred object to the vehicle. Note that these scores are only provided to help participants, and they are not taken into account to determine the winner of the challenge. A detailed description of the subsets is provided in the Talk2Car paper.


Important Dates

  • Freeze Leaderboard: July 18 2020 (Participants can still submit, but leaderboard is no longer updated).
  • End of Challenge: August 1 2020   (Participants ranking above the MSRR baseline will be contacted to write a paper).
  • Submit paper and video: August 11 2020 (Participants need to submit a paper and video through ECCV20').
  • Winners Presentation: August 23 2020 (Winners can present their solution at the C4AV workshop).



  • Teams that are placed 1st, 2nd, and 3rd at the end of the challenge on August 1 2020 will be considered the winners of the challenge. Winners of the challenge are expected to submit a 6-14 page paper (ECCV format) to the workshop that details their solution. We will personally contact the winning teams and provide them with all necessary details. Winning teams that fail to submit an explanatory paper in time for the workshop event won't be eligible for the challenge prizes. The paper will be published as a workshop paper.
  • The 1st, 2nd and 3rd team will be awarded a cash prize of 1500, 500, and 250 euro respectively, and will be invited to present their solution during the workshop (virtually).


The organizers would like to thank Huawei for sponsoring the Commands for Autonomous Vehicles Workshop at ECCV20'. 


Please consider citing the following works if you use the data provided by this challenge.

- Deruyttere, Thierry, et al. "Talk2Car: Taking Control of Your Self-Driving Car." Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). ACL, 2019.'

- Deruyttere, Thierry, et al. "Giving Commands to a Self-Driving Car: A multimodal Reasoner for Visual Grounding" AAAI 2020 - Workshop RCQA.

- Vandenhende, Simon, Thierry Deruyttere, and Dusan Grujicic. "A Baseline for the Commands For Autonomous Vehicles Challenge." arXiv preprint arXiv:2004.13822 (2020).

- Caesar, Holger, et al. "nuscenes: A multimodal dataset for autonomous driving." arXiv preprint arXiv:1903.11027 (2019).






Getting Started