Round 1: Completed

TartanAir Visual SLAM - Stereo Track

Visual SLAM in challenging environments

2500 Prize Money

TartanAir Visual SLAM Challenge - Stereo Track

Welcome to TartanAir Visual SLAM (Simultaneous Localization and Mapping) Challenge, one of the official challenges in the CVPR 2020 SLAM workshop. This benchmark focuses on the SLAM problem in environments with challenging features such as changing light conditions, low illumination, adverse weather, and dynamic objects.

The CVPR Visual SLAM challenge consists of the monocular track and the stereo track. Each track contains 16 trajectories, which is further divided into easy and hard categories. We also provide a large set of training data collected in 18 different environments with multiple ground truth labels including camera pose, disparity, segmentation, and optical flow.   

TartanAir Dataset

This benchmark is based on the TartanAir dataset, which is collected in photo-realistic simulation environments based on the AirSim project. A special goal of this dataset is to focus on the challenging environments with changing light conditions, adverse weather, and dynamic objects. The four most important features of our dataset are:

  • Large size diverse realistic data. We collect the data in diverse environments with different styles, covering indoor/outdoor, different weather, different seasons, urban/rural.
  • Multimodal ground truth labels. We provide RGB stereo, depth, optical flow, and semantic segmentation images, which facilitates the training and evaluation of various visual SLAM methods. 
  • Diversity of motion patterns. The existing popular datasets such as KITTI and Cityscapes only cover very limited motion patterns, which are mostly moving straight forward plus small left or right turns. This regular motion is too simple to sufficiently test a visual SLAM algorithm. Our dataset covers much more diverse motion combinations in 3D space, which is significantly more difficult than existing datasets.
  • Challenging Scenes. We include challenging scenes with difficult lighting conditions, day-night alternating, low illumination, weather effects (rain, snow, wind and fog) and seasonal changes.

Please refer to the TartanAir Dataset and the paper for more information. 


Get Started

1. Download the testing data. 

   Click here to download the testing data for the monocular track. (Size: 17.51 GB)
   MD5 hash: 8a3363ff2013f147c9495d5bb161c48e

   File structure: 

--- SE000                                          # stereo easy trajectory 0 
|       |
|       ---- image_left                          # left image folder
|       |       |
|       |       ---- 000000_left.png       # RGB left image 000000
|       |       ---- 000001_left.png       # RGB left image 000001
|       |       .
|       |       .
|       |       ---- 000xxx_left.png        # RGB left image 000xxx
|       |
|       ---- image_right                       # right image folder
|               |
|               ---- 000000_right.png     # RGB right image 000000
|               ---- 000001_right.png     # RGB right image 000001
|               .
|               .
|               ---- 000xxx_right.png     # RGB right image 000xxx
+-- SE001                                         # stereo easy trajectory 1 
+-- SE007                                        # stereo easy trajectory 7 
+-- SH000                                       # stereo hard trajectory 0 
+-- SH007                                 # stereo hard trajectory 7 

   More details

2. Download the evaluation tools. 

    Download the tartanair_tools repository, and follow the instruction here

3. (Optional) Training data. 

    There are two ways to access the training data. 

    * Download data to your local machine

    * Access the data using Azure virtual machine

4. Submit the results. 

The trajectory from the left camera is expected. For each of the 16 trajectories (SE00X or SH00X) in the testing data, compute the camera poses, and save them in the text file with the name SE00X.txt or SH00X.txt. Put all 16 files into a zip file with the following structure: 

--- SE000.txt                             # result file for the trajectory SE000 
--- SE001.txt                             # result file for the trajectory SE001
|          ..
|          ..
--- SE007.txt                             # result file for the trajectory SE007
--- SH000.txt                             # result file for the trajectory SH000
--- SH001.txt                             # result file for the trajectory SH001
|          ..
|          ..
--- SH007.txt                             # result file for the trajectory SH007 

The camera pose file should have the same format as the ground truth file in the training data. It is a text file containing the translation and orientation of the camera in a fixed coordinate frame. Note that our automatic evaluation tool expects the estimated trajectory to be in this format. 

  • Each line in the text file contains a single pose.

  • The number of lines/poses must be the same as the number of image frames in that trajectory. 

  • The format of each line is 'tx ty tz qx qy qz qw'. 

  • tx ty tz (3 floats) give the position of the optical center of the color camera with respect to the world origin in the world frame.

  • qx qy qz qw (4 floats) give the orientation of the optical center of the color camera in the form of a unit quaternion with respect to the world frame. 

  • The trajectory can have an arbitrary initial position and orientation. However, we are using the NED frame to define the camera motion. That is to say, the x-axis is pointing to the camera's forward, the y-axis is pointing to the camera's right, the z-axis is pointing to the camera's downward.

Evaluation Criteria

For a known ground truth trajectory SE000_gt.txt and an estimated trajectory SE000_est.txt, we calculate the translation and rotation error based on the normalized Relative Pose Error similar to the KITTI dataset. Different from KITTI, we compute translational and rotational errors for all possible subsequences of length (5, 10, 15, ...,40) meters.  The translational error and rotational error are then combined to the final score: \(E = {E_{rot} + \beta E_{trans} }\) \(E = {E_{rot} + \beta E_{trans} }\), where we use \(\beta=7\)\(\beta=7\) to balance the two errors, because the average rotation speed (in degree) is 7 times bigger than the average translation speed on our dataset.  

Competition Tracks and Prizes

Monocular Track

  • The prize for the winner:
    • $1500 USD
  • The prize for the runner up: $1000 USD  

Stereo Track

  • The prize for the winner:
    • $1500 USD
  • The prize for the runner up: $1000 USD  


  • Deadline of the submission: Aug 15th, 2020.
  • Winners announcement: TBD



01 MEGVII_SLAM 0.119
02 VOLDOR_SLAM 0.177
03 OV2SLAM 0.182
04 SAIT_China_Lab 0.224
05 akshatcx 1.644