Each year our video coaches spend hours processing video, it is our goal to automate this process using computer vision. While the ideal solution would be unsupervised we are leaving this open to semi-supervised solutions.
The challenge is to take the given x and y coordinates on an image of a hockey game in progress and translate those images to the coresponding top-down view of a hockey ice surface. The solution will provide the player's location on the ice surface. The example below shows an in-game image and the players translated to their positions on the ice in the top-down view.
- Start Time: 4:00 PM (UTC) 25th November 2020
- End Time: 4:00 PM (UTC) 15th February 2021
There will be 3 cash prizes;
- $3,000 USD
- $1,500 USD
- $500 USD
The dataset consists of videos, images and player co-ordinates. The videos vary in view point, camera type and lighting. The dataset folder is structured in the following way:
- video - the video path relative to the dataset folder
- image - the frame image path relative to the dataset folder
- players - the coordinates array for the player coordinates on the image [[player1x, player1y], [player2x, player2y], etc...]
🚀 Training Data
Under the resources tab there is a training data zip which contains an additional 2 clips with 75 sample frames complete with ground truth coordinates and inverted ground truth coordinates.
- Prepare a csv containing the following headers
- videoId ~ This value is provided in the player_coords.csv file
- frameId ~ This value is provided in the player_coords.csv file
- cords ~ This is the coordinates in [[player1x, player1y], [player2x, player2y], etc..] format
- time (ms) ~ The time in milliseconds it took to run the code that processed the frame
👩💻 Getting Started
Some links to similar projects have been posted in the Getting Started discussion topic.
Under the resource tab there is a skeleton implementation. It provides a python boilerplate code that calculates time of execution and prepares the information needed both for image processing and/or video processing.
🖊 Evaluation Criteria
For the primary scoring we check the ground truth as well as the inverted ground truth as each image could potentially come from the inverted angle. The scoring algorithm will choose your highest score from the ground truth and inverted ground truth.
Primary scoring is calculated using roughly 5,000 ground truth coordinates and calculating the distance provided from the ground truth. The closer to the ground truth the larger the score.
The secondary scoring is the sum of the time (ms) column, faster solutions will win tie breakers