AIcrowd | Mono Depth Perception

Round 1: Completed

Round 2: Completed Weight: 1.0

AIcrowd &

Amazon Prime Air

12.3k

1123

149

824

🎯 Select your final submissions here

📕 Make your first submissions for Mono Depth Perception using the Starter Kit)!

📝 Have you explore the baseline for Mono Depth Perception?

👥 Challenges are more fun with friends. Find teammates for SUADD'23 💬

📝 The Task

Unmanned Aircraft Systems (UAS) have various applications, such as environmental studies, emergency responses or package delivery. The safe operation of fully autonomous UAS requires robust perception systems.

For this challenge, we will focus on images of a single downward camera to estimate the scene's depth and perform semantic segmentation. The results of these two tasks can help the development of safe and reliable autonomous control systems for aircraft.

This challenge includes the release of a new dataset of drone images that will benchmark semantic segmentation and mono-depth perception. The images in this dataset comprise realistic backyard scenarios of variable content and have been taken on various Above Ground Level (AGL) ranges.

This challenge aims to foster the development of fully autonomous Unmanned Aircraft Systems (UAS).

To achieve this, it needs to overcome a multitude of challenges. To leverage fully autonomous drone navigation, the device needs to understand both objects in a scene and the scale and distance to them.

This project's two key computer vision components are semantic segmentation and depth perception.

With this challenge, we aim to inspire the Computer Vision community to develop new insights and advance state-of-the-art in perception tasks involving drone images.

In this task, we focus on the mono-depth estimation task.

Mono-Depth Estimation

Depth estimation measures the distance between the camera and the objects in the scene. It is an important perception task for an autonomous aerial drone. Using two stereo cameras makes this task solvable with stereo vision methods. This challenge aims to create a model that can use the information of a single camera to predict the depth of every pixel.

The output of this task must be an image of equal size to the input image, in which every pixel contains a depth value.

💾 Dataset

The dataset consists of a collection of flight frames at given timestamps taken from one of the downward cameras of our drones during dedicated data collection operations, not during customer delivery operations.

The dataset contains 412 flights, 2056 total frames (5 frames per flight at different AGLs), Full semantic segmentation annotations of all frames and depth estimations. The dataset has been split into training and (public) test datasets. While the challenge will be scored using a private test dataset, we considered it useful to have this split to allow teams to share their results even after the challenge ends.

This dataset contains birdseye-view greyscale images taken between 5 m and 25 m AGL. Annotations for the semantic segmentation task are fully labelled images across 19 distinct classes. While annotations for the mono-depth estimation task have been computed with geometric stereo-depth algorithms. To the best of our knowledge, this is the largest dataset with full semantic annotations and monodepth estimation ground-truth over a wide range of AGLs and different scenes.

Depth annotations contain relative depth maps (the absolute depth value, i.e. in meters, cannot be determined from only them). They are encoded as uint16 images, but need to be converted into float values that represent the depth. We represent invalid values with 0. In particular, you can decode these images with the following snippet:

Ethical Considerations About The Data

The dataset of the challenge contains images of realistic flight footage taken as part of our research and development programs, not from real customer deliveries. Furthermore, it is ensured that all personal identifiers are removed.

💪 Starter Kit and Baselines

To make your first submission easy, we have put together a starter-kit and a baseline for you. These will guide you through the documenation, submission flow, dataset and even help you in making your first submission.

📅 Timeline

Challenge Launch: 22nd December 2022
Challenge End: 28th April 2023
Winner Announcement: 30th June 2023

Depth Perception

🥇 The Top scoring submission will receive $15,000 USD
🥈 The Second best submission will receive $7,500 USD
🥉 The Third place submission will receive $1,250 USD

🏅 The Most “Creative” solution submitted to the whole competition, as determined by the Sponsor’s sole discretion, will receive $2,500 USD.

🔗 Links

🏆 Discussion Forum

💪 Leaderboard

📝 Notebooks

📱 Contact
For questions, queries, feedbacks and suggestions, contact: suadd23-challenge@amazon.com.