Air traffic control (ATC) is the backbone of what is arguably the key means of personal transport in the modern world. The key issue in controlling the airspace is to know where an aircraft is at any given time. Recent technological developments and stricter separation needs have given rise to new methods of aircraft localization, most notably the automatic dependent surveillance–broadcast protocol (ADS-B).
In ADS-B, aircraft simply report their exact location (determined with onboard GPS sensors) to ground stations periodically. While this approach has many advantages, the transfer of control over the reported location to the aircraft, however, brings a number of safety and security issues along. To mitigate these issues, complementary or redundant localization methods are needed that are independent of the aircraft.
At the same time, crowdsourced air traffic communication networks have gained importance over the past decade. Companies such as FlightRadar24 and FlightAware, research networks such as the OpenSky Network, and increasingly flight authorities themselves rely less and less on planned deployments of ATC receivers. Instead, they use distributed networks that are randomly deployed or even highly mobile such as satellite receivers. Contrary to traditional, carefully planned receiver networks, this crowdsourced use of mostly cheap sensors provides a number of new challenges to existing localization algorithms.
This competition is about finding the best methods to localize aircraft based on crowdsourced air traffic control communication data. The data is collected by the OpenSky Network, a large-scale ADS-B sensor network for research and organised by the Swiss Cyber-Defence Campus of armasuisse Science and Technology. OpenSky was first presented at the IEEE/ACM IPSN conference in Berlin in 2014 in this paper. As of today, the OpenSky Network continuously collects air traffic control data from thousands of aircraft. This data is received and streamed to OpenSky over the Internet by a crowd which has registered more than 3000 sensors. The outcome of the competition is to determine the positions of all aircraft which do not have position reporting capabilities or may report wrong locations. To do so, competitors will rely on time of arrival and signal strength measurements reported by many different sensors. Although methods like multilateration are long known, this data poses new challenges because most of the low-cost sensors are not time synchronized or calibrated. Competitors will therefore have to face different kinds of noise, ranging from clock drifts, inaccurate sensor locations, or broken timestamps due to software bugs.
We provide labelled training datasets which include the locations of all aircraft. These labelled data sets can be used by the participating teams to train their models. For the competition, we further provide access to non-labelled evaluation data sets. The task is to find all locations of aircraft that are missing location information in the data sets. The teams submit their results (as a CSV file) to AICrowd, where an indicator of the accuracy of their solution is calculated and an intermediate ranking is provided. When the competition time ends, the final ranking is determined.
We encourage individuals and teams of up to 5 persons from all backgrounds to register and participate. We strongly emphasize our openness towards novel approaches (such as machine learning) but also allow competitors to adapt their "traditional" localization models (e.g., multilateration algorithms) to the peculiarities of the crowdsourced measurement data. Members affiliated with the OpenSky Network, armasuisse, or the Cyber-Defence Campus are excluded from the competition.
Participants will get labelled training data to prepare for the competition period. For both training phase and competition, we have different data sets resulting in two competition rounds with increasing level of difficulty.
Round 1: Synchronized Receivers
In this category, the competitors do not have to put any effort into sensor time synchronization as all provided data is from GPS-equipped sensors, which simplifies things significantly. Furthermore, the rough geometric height of the aircraft can be estimated based on the barometric altitude provided.
Round 2: Unsynchronized Receivers
In this round, only some receivers will provide GPS-synchronized timestamps while others do not offer such tight time synchronization and thus experience clock drifts or possibly fully broken timestamps. The competitors will have to put efforts into the synchronization of the timestamps of all unsynchronized sensors and the filtering of broken timestamps to get decent results. However, they can improve their localization algorithms by incorporating the barometric altitude information. This round is considerably harder than round 1.
Round 1: June 15 - July 31, 2020
Round 2: September 14 - October 30, 2020
1. The award breakdown for the second round is as follows:
1st place: 4000 CHF
2nd place: 3000 CHF
3rd place: 2000 CHF
4th place: 1000 CHF
5th place: 500 CHF
2. We invite the winners of the second round to present their solutions at the 8th OpenSky Symposium at EUROCONTROL in Brussels on November.
Free registration and travel grants for one individual of each winning team will be provided. The Symposium will be held virtually due to the ongoing COVID-19-related travel restrictions.
3. Besides the invitation to submit a paper/presentation to the OpenSky Symposium, the organizers may invite the top teams to co-author any publications that we are planning to write in respect to this competition.
After the terrific results of the first round, we expect that the best teams will perform better than a RMSE of 1000m for the second round as well, despite the harder challenge.
Thus, the following conditions apply:
a. The full cash prizes will only be awarded to a top 5 team if their RMSE score is below 1000m.
b. 50% of the full cash prizes will be awarded to a top 5 team if their RMSE score is below 5000m.
💾 Data Sets
Training and competition datasets for round 1 can be downloaded here.
Both training and competition data sets are provided as CSV files. Each CSV file contains the data that was recorded by the OpenSky Network over a duration of 1 hour and has a (uncompressed) size of about 300 MB (round 1) and 1.3GB (round 2), respectively. All round 1 datasets were extracted over the course of about 48 hours. All round 2 datasets were extracted over the course of four days.
Each row in the CSV file represents the reception of one aircraft position report and contains the following information:
- A unique aircraft identifier
- Unix timestamp indicated when the message was received by OpenSky
- Unique identifiers of all sensors which received this signal
- Nanosecond timestamps from each of the sensors
- Signal strength measurements from each of the sensors
- The position of the aircraft (latitude, longitude, height)
- The barometric altitude of the aircraft
The position of the aircraft (6.) will be empty (NaN) for those aircraft that need to be localized. However, it will be present for many other aircraft and can be used to synchronize the receiver clocks for round 2. In the training data sets, all columns will be present.
In addition to this set of measurement data, we will provide meta data for all sensors (roundX_sensors.csv). The meta data includes:
- A unique sensor identifier
- The position of the sensor (latitude, longitude, height)
- The type of hard- and software
It is worth noting here that the positions of the sensors (2.) are of varying accuracy. The sensor positions have only been entered by the user when the sensor was added to the network and there is no guarantee for correctness or accuracy. While some users report accurate positions for their antennas (e.g., measured with their smartphone), others just provide a rough estimate based on services like Google Maps. Some might even report wrong locations intentionally for privacy reasons.
More detailed background information on the data provided for the competition, including an introduction to the theory of the problem, can be found here.
For round 2, we expect a CSV submission of the following format:
Your file should have exactly 632,932 rows plus the header row and address all missing values in the competition data set. Any aircraft position that is not predicted should be filled with NaN. At least 90% of all rows/position must be predicted to be eligible for scoring and ranking. An empty example submission file can be found with the competition datasets.