Loading
Round 1: 353 days left

Trajnet++ (A Trajectory Forecasting Challenge)

415
7
5

Trajectory forecasting in crowded scenes has become an important topic in recent times because of the increasing demands of emerging applications of artificial intelligence like autonomous cars and service bots. One important challenge in trajectory forecasting is to effectively model the interaction among agents. In the past few years, several novel methods have been proposed to tackle agent-agent interactions. However, these methods have been evaluated on different subsets of the available data without proper indexing of trajectories making it difficult to objectively compare the forecasting techniques.

We introduce TrajNet++, a new, large scale trajectory-based benchmark. Researchers have to study how their method performs in explicit agent-agent scenarios. Our challenge provides not only proper indexing of trajectories but also a unified extensive evaluation system to test the gathered methods for a fair comparison.

What do we provide?

We present a framework for the fair evaluation of trajectory forecasting algorithms, explicitly in agent-agent scenarios. We provide:

  • A large collection of agent-agent centric datasets
  • A defined categorisation of trajectories
  • A common evaluation tool providing several performance measures
  • An easy way to compare the performance of state-of-the-art methods.

Data Description

The dataset files contain two different data representations:

1.Scene

{“scene”: {“id”: 266, “p”: 254, “s”: 10238, “e”: 10358, “fps”: 2.5, “tag”: 2}}

  • id: scene id
  • p: pedestrian ID
  • s, e: starting and ending frames id of pedestrian “p”
  • fps: frame rate.
  • tag: trajectory type. Discussed in detail below.

Note: Corresponding to each scene, there exists a primary pedestrian denoted by the pedestrian ID of the scene. The scene is categorised (tag) with respect to this primary pedestrian.

2.Track

{“track”: {“f”: 10238, “p”: 248, “x”: 13.2, “y”: 5.85, “pred_number”: 0, “scene_id”: 123}}

  • f: frame id
  • p: pedestrian ID
  • x, y: x and y coordinates in meters of pedestrian “p” in frame “f”.
  • pred_number: prediction number. This is useful when you are providing multiple predictions as opposed to a single prediction. Max 3 predictions allowed
  • scene_id: This is useful when you are providing predictions of other agents in the scene as opposed to only primary pedestrian prediction.

For a more detailed description, we provide the following helper code: Tools for Trajnet++

Trajectory Categorization

We explicitly categorise the primary pedestrian trajectory of the scene into different types. The definition of each type is provided below:

  • Static (Type I): If the euclidean displacement of the primary pedestrian in the scene is less than 1 meter

  • Linear (Type II): If the trajectory of the primary pedestrian can be correctly predicted with the help of an Extended Kalman Filter (EKF). A trajectory is said to be correctly predicted by EKF if the FDE between the ground truth trajectory and predicted trajectory is less than 0.5 meter.

  • Non-Linear: The rest of the scenes are classified as ‘Non-Linear’. We further divide non-linear scenes into Interacting (Type III) and Non-Interacting (Type IV).

We further sub-categorize the Interacting (Type III) trajectories as follows:

  • Leader Follower: Leader follower phenomenon refers to the tendency to follow pedestrians going in relatively the same direction. The follower tends to regulate his/her speed and direction according to the leader. If the primary pedestrian is a follower, we categorize the scene as Leader Follower.

  • Collision Avoidance: Collision avoidance phenomenon refers to the tendency to avoid pedestrians coming from the opposite direction. We categorize the scene as Collision avoidance if primary pedestrian to be involved in collision avoidance.

  • Group: The primary pedestrian is said to be a part of a group if he/she maintains a close and roughly constant distance with atleast one neighbour on his/her side during prediction.

  • Others: Trajectories where the primary pedestrian undergoes social interactions other than Leader Follower, Collision Avoidance and Group. We define social interaction} as follows: We look at an angular region in front of the primary pedestrian. If any neighbouring pedestrian is present in the defined region at any time-instant during prediction, the scene is classified as having a presence of social interactions.

If a trajectory of primary pedestrian is non-linear and undergoes no social interactions during prediction, the trajectory is classified as Non-Interacting (Type 4).

Figure 1

During evaluation, we provide the evaluation of the submitted model with respect to each of above categories to provide insight into the model performance in different scenarios.

We rely on the spirit of crowdsourcing, and encourage researchers to submit their sequences to our benchmark, so the quality of trajectory forecasting models can keep increasing in tackling more challenging scenarios.

Metrics

A good benchmark requires not only a standard dataset but also important evaluation metrics to provides insights regarding the model performance through different perspectives. We describe the evaluation metrics for this challenge:

  • Final Displacement Error (FDE): The L2 distance between the final ground truth coordinates and the final prediction coordinates of the primary pedestrian. Lower is better

  • Average Displacement Error (ADE): Average L2 distance between the ground truth and prediction of the primary pedestrian over all predicted time steps. Lower is better.

  • Ground Truth Collision (Col I): Calculates the percentage of collisions of primary pedestrian with neighbouring pedestrians in the scene. The ground truth of neighbouring pedestrians is used to check the occurrence of collisions. Lower is better.

  • Prediction truth Collision (Col II): Calculates the percentage of collisions of primary pedestrian with neighbouring pedestrians in the scene. The model prediction of neighbouring pedestrians is used to check the occurrence of collisions. Lower is better.

Data

The training and test datasets can be found here

Submission

We strongly encourage all participants to use only the sequences from the training set for finding parameters and report results on the provided test scenarios to enable a meaningful comparison of forecasting methods.

File Format

To have your predictions evaluated, you need to submit a single .zip file containing the exact same directory structure and file names as the test file. Specifically, you will be given a single .zip with folders ‘real_data’ and ‘synth_data’ within the parent folder ‘test’. Each of these folders will contain one or more .ndjson files. In every file, corresponding to each “scene” (length = 21 frames), you are supposed to predict the coordinates of the primary pedestrian and the corresponding neighbours in the last 12 frames (Tpred = 12), given the observations for first 9 frames (Tobs = 9) only.

Please note: You are supposed to append your predicted tracks to the test scenes and tracks. The observed test tracks do not have the pred_number and scene_id attributes (set to None). The predicted tracks (last 12 frames) MUST have the pred_number (numbering starts from 0) and scene_id (corresponding to the id of scene being predicted) attributes, even when outputting a single prediction corresponding to each scene.

Your submitted file may contain multiple predictions corresponding to each scene, if your model outputs multiple predictions. The maximum predictions is limited to 3. The prediction number corresponding to minimum FDE with respect to the primary pedestrian, for every scene, is considered.

An example of input test file and output prediction file is provided below.

Input Test

Output Predictions

As mentioned above, please submit a single .zip file that matches exactly the format given to you for testing.

Evaluation

Once your files are correctly submitted, they will be graded with multiple criteria. The primary and secondary grades correspond to the final displacement error in the real test dataset and synthetic test dataset respectively. A figure comparing the submitted model to baseline Vanilla LSTM model and a table containing a detailed model evaluation is also provided. An example of the output is shown below:

Figure 2

The result of the baseline: Vanilla LSTM Baseline Score

License

The datasets provided in this challenge are published under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. This means that you must attribute the work in the manner specified by the authors, you may not use this work for commercial purposes and if you alter, transform, or build upon this work, you may distribute the resulting work only under the same license.

Resources

In this section, participants can find useful resources for the Trajnet++ challenge.

Visualisations

We provide visualisations for the datasets provided in order to better understand the data. The visualisations capture attributes of human motion as well as nature of interactions in the different datasets.

Tools for TrajNet++

Baselines

We provide baseline codes of important papers in trajectory prediction.

Baseline algorithms for TrajNet++