Task 2: Completed Weight: 1.0

⏰ Final date for making submissions to Task 1 is 30th April 18:00 UTC

πŸš€ Getting Started Code with Random Predictions

πŸ’ͺ Baseline Code

❓  Have a question? Visit the discussion forum

πŸ’» How to claim AWS credits

chat on Discord

πŸ•΅οΈ Introduction

Task-1 : Classical Classification Task

You've been approached by a neuroscience lab studying social behaviors in mice, and asked to help them automate their behavioral annotation process. They are able to provide you with several hours of example videos that have all been annotated consistently for the three behaviors that lab would like to study.

Your first task in the Challenge is to predict bouts of attack, mounting, and close investigation given the tracked poses of a pair of interacting mice. For this task, we're providing plenty of labeled training data: 70 videos, from 1-2 up to 10+ minutes in length, that have been manually annotated by a trained expert for our three behaviors of interest. Training and test sets for this task are both annotated by the same individual.

If you want to try unsupervised feature learning or clustering, you can also make use of the videos in our global test set, which is shared across challenge tasks. The global test set contains tracked poses from almost 300 additional videos of interacting mice. (Note that a portion of videos in the global test set are also used to evaluate your performance on this task.)

Task 1 overview

Understand with code! Here is getting started code for you.πŸ˜„

πŸ’Ύ Dataset

We provide frame-by-frame annotation data and animal pose estimates extracted from top-view videos of interacting mice recorded at 30Hz; raw videos will not be provided. Videos for all three challenge tasks use a standard resident-intruder assay format, in which a mouse in its home cage is presented with an unfamliar intruder mouse, and the animals are allowed to freely interact for several minutes under experimenter supervision.

Pose keypoints

Animal poses are characterized the tracked locations of body parts on each animal, termed "keypoints." Keypoint locations were estimated using the Mouse Action Recognition System (MARS), which uses a Stacked Hourglass network trained on 15,000 hand-labeled images of mice.

Keypoints are stored in an ndarray with the following properties:

  • Dimensions: (# frames) x (mouse ID) x (x, y coordinate) x (body part).
  • Units: pixels; coordinates are relative to the entire image. Original image dimensions are 1024 x 570.

where mouse ID is 0 for the "resident" mouse and 1 for the "intruder" mouse, and body parts are ordered: nose, left ear, right ear, back of neck, left hip, right hip, base of tail. The placement of these keypoints is illustrated below:

diagram of keypoint locations


A behavior is a domain expert-defined action of one or both animals. Most of the behaviors included in this challenge are social behaviors, involving the position and movements of both animals. Unless otherwise noted, annotations refer to behaviors initiated by the "resident" (mouse ID==0) in this assay.

Behaviors were annotated on a frame-by-frame basis by a trained human expert, based on simultaneous top- and front-view video of the interacting mice. For task 1, every video frame was labeled as either close investigation, attack, mounting, or "other" (meaning none of the above). For descriptions of each behavior, see "Mouse behavior annotation" in the Methods section of Segalin et al, 2020.

Behaviors are stored in a list annotations with the following properties:

  • Dimensions: (# frames), the number of frames in a given video
  • Values: [0, 1, 2, …], the index of the behavior in the list vocabulary (see next section)

For example, if vocabulary = ['attack', 'investigation', 'mount', 'other'] for a given dataset, then a value of 1 in annotations[i] would mean that investigation was observed on frame (i+1) of the video.

πŸ“ Files

The following files are available in the resources section. Note that a "sequence" is the same thing as one video- it is one continuous recording of social interactions between animals, with duration between 1-2 and 10+ minutes, filmed at 30 frames per second.

  • train.npy - Training set for the task, which follows the following schema :
    "vocabulary": ['attack', 'investigation', 'mount', 'other'], the names of the behaviors annotated in this dataset.
    "sequences" : {
        "<sequence_id> : {
            "keypoints" : a ndarray of shape (`frames`, 2, 2, 7), where "frames" refers to the number of frames in the dataset. More details about the individual keypoints provided above.
            "annotations" : a list containing the behavior annotations for each of the frames. The list contains the index-number of the corresponding entry in "vocabulary" (so in this example, 0 = 'attack', 1 = investigation, etc.)
            "annotator_id" : Unique ID for the annotator who annotated this video. For task 1, this value is always 0.
  • test.npy - Test set for the task, which follows the following schema (note that this is the same file for all three tasks):
    "<sequence_id> : {
        "keypoints" : a ndarray of shape (`frames`, 2, 2, 7), where frames refers to the number of frames in the dataset. More details about the individual keypoints provided above.
        "annotator_id" : Unique ID for the annotator who annotated this given sequence of the data. We use the same test set for all tasks, so you will observe annotator_id values from 0 to 5- only sequences with annotator_id==0 will count towards your score for this task, however you can submit predictions for all sequences if you'd like.
  • sample_submission.npy - Template for a sample submission for this task, follows the following schema :
    "<sequence_id-1>" : [0, 0, 1, 2, .....]
    "<sequence_id-2>" : [0, 1, 2, 0, .....]

A "sequence" in this setting is one uninterrupted video of animal interactions; videos vary from 1-2 up to 10+ minutes in length.

In sample_submission, each key in the dictionary refers to the unique sequence id of a video in the test set. The item for each key is expected to be a list of integers of length frames, representing the index of the predicted behavior name in the vocabulary field of train.npy.

πŸš€ Submission

  • Sample submission format is described in the Files section above.

Make your first submission here πŸš€ !!

To test out the system, you can start by uploading the provided sample_submission.npy. When you make your own submissions, they should follow the same format.

πŸ–Š Evaluation Criteria

During evaluation F1 score is used as the Primary Score by which teams are judged. We use Precision as a tie-breaking Secondary Score when teams produce identical F1 scores. We use macro averaging when computing both the F1 Score and the Precision, and the scores from the other class are never considered during the macro-averaging phase when computing both the scores.

πŸ† Prizes

The cash prize pool across the 3 tasks is $9000 USD total (Sponsored by Amazon and Northwestern)

For each task, the prize pool is as follows. Prizes will be awared for all the 3 tasks

  • πŸ₯‡ 1st on leaderboard: $1500 USD
  • πŸ₯ˆ 2nd on the leaderboard: $1000 USD
  • πŸ₯‰ 3rd on the leaderboard: $500 USD


Additionally, Amazon is sponsoring $10000 USD total of SageMaker credits! πŸ˜„ 

Please check out this post to see how to claim credits.


πŸ”— Links

πŸ“« Contact

πŸ™ Sponsors



See all
[Task 1] Classical Classification [Baseline]
About 3 years ago