Task 2: Completed Weight: 1.0

⏰ Task deadline extended to May 7th

πŸš€ Getting Started Code with Random Predictions

πŸ’ͺ Baseline Code

❓  Have a question? Visit the discussion forum

πŸ’» How to claim AWS credits

chat on Discord

πŸ•΅οΈ Introduction

Task 3 : Learning New Behavior Task

Your collaborators have decided to branch out and start studying several other behaviors, and they've asked you to help them create classifiers for those behaviors, too. Unfortunately since they only just started studying these behaviors, they are only able to provide you with a small number of annotated example videos for each action, but they are confident you'll be able to help!

In Task 3, your goal is to automate the annotation of seven new behaviors of interest, given only a small number of hand-labeled training videos for each behavior. This is an important task for researchers, as it would allow them to quickly define a new behavior and then detect it in an entire dataset.

Training and test sets for this task have all been annotated by the same individual, annotator 0.

If you want to try unsupervised feature learning or clustering, you can also make use of the videos in our global test set, which is shared across challenge tasks. The global test set contains tracked poses from almost 300 additional videos of interacting mice. (Note that a portion of videos in the global test set are also used to evaluate your performance on this task.)

Task 3 overview

πŸ’Ύ Dataset

We provide frame-by-frame annotation data and animal pose estimates extracted from top-view videos of interacting mice recorded at 30Hz; raw videos will not be provided. Videos for all three challenge tasks use a standard resident-intruder assay format, in which a mouse in its home cage is presented with an unfamliar intruder mouse, and the animals are allowed to freely interact for several minutes under experimenter supervision.

Please refer to Task 1 for an explanation of pose keypoints and annotations.

πŸ“ Files

The following files are available in the resources section. Note that a "sequence" is the same thing as one video- it is one continuous recording of social interactions between animals, with duration between 1-2 and 10+ minutes, filmed at 30 frames per second.

⚠️⚠️⚠️️️IMPORTANT⚠️⚠️⚠️ The structure of this training file is different from that of Tasks 1 and 2: rather than providing a single list of sequences, we provide a separate list of sequences for each behavior to be classified. Furthermore, the entries of annotations do not correspond to items in vocabulary-- because this is a binary classification task, annotations contains only 0/1 values, with 1 referring to the target behavior, and 0 referring to "other". Your submissions for this challenge should follow a similar convention- see below.

  • train.npy - Training set for the task, which follows the following schema :
    "vocabulary": ['behavior-0', 'behavior-1', 'behavior-2', 'behavior-3', 'behavior-4', 'behavior-5', 'behavior-6'], # Sorted list of behavior vocabulary words. Note that the names of the behaviors have been anonymized, and that "other" is not included in the vocabulary for this task.
    "behavior-0": {
        "<sequence_id> : {
            "keypoints": a ndarray of shape (`frames`, 2, 2, 7), where "frames" refers to the number of frames in the dataset. More details about the individual keypoints provided in Task 1.
            "annotations" : a list containing the behavior annotations for each of the frames. As this is a binary classification task, the list should contain only `0`/`1` values, with `1` refering to the said behavior, and `0` refering to "other".
            "annotator_id" : // Unique ID for the annotator who annotated this given sequence of the data. For task-1, this value is always 0. Please note that there are numerous entries in this test set which are decoy data points, and in those cases, the annotator-id has been chosen randomly.
    "behavior-1": {
        "<sequence_id> : {
    "behavior-2": {
    "behavior-6": {

NOTE : unlike Task 1 and Task 2, in this task each behavior has its own training examples, making this a set of binary classification tasks. In your submission, you are expected to include predictions for all sequences in test.npy for each behavior- see sample_submission.npy below for an example.

  • test.npy - Test set for the task, which follows the following schema (note that this is the same file for all three tasks):
    "<sequence_id> : {
        "keypoints" : a ndarray of shape (`frames`, 2, 2, 7), where "frames" refers to the number of frames in the dataset. More details about the individual keypoints provided in Task 1.
        "annotator_id" : // Unique ID for the individual who annotated this video.
  • sample_submission.npy - Template for a sample submissionfor this task, follows the following schema. Note that the submission contains a field for each behavior in the training set, and that each behavior field contains predicted sequences for all N videos in test.npy!
    "behavior-0" : {
        "<sequence_id-1>" : [0, 0, 1, 1, .....],
        "<sequence_id-2>" : [0, 0, 0, 1, .....],
        "<sequence_id-N>" : [0, 0, 0, 1, .....],
    "behavior-1 : {
        "<sequence_id-1>" : [1, 0, 0, 1, .....],
        "<sequence_id-2>" : [0, 1, 0, 0, .....],
        "<sequence_id-N>" : [0, 0, 0, 0, .....],
    "behavior-2 : {
    "behavior-6 : {

A "sequence" in this setting is one uninterrupted video of animal interactions; videos vary from 1-2 up to 10+ minutes in length.

sample_submission here is a dictionary of behaviors, each of which is in turn a dictionary of sequences. Each key in the dictionary for a given behavior refers to the unique sequence id of a video in the test set. The item for each key is expected to be a list of integers of length frames- however unlike previous tasks, list entries here should be 0 for frames classified as "other" and 1 for frames classified as the behavior of the parent dictionary (behavior-0, behavior-1, etc.)

πŸš€ Submission

  • Sample submission format is described in the dataset section above.

Make your first submission here πŸš€ !!

To test out the system, you can start by uploading the provided sample_submission.npy. When you make your own submissions, they should follow the same format.

πŸ–Š Evaluation Criteria

During evaluation F1 score is used as the Primary Score by which teams are judged. We use Precision as a tie-breaking Secondary Score when teams produce identical F1 scores. We use macro averaging when computing both the F1 Score and the Precision, and the scores from the other class are never considered during the macro-averaging phase when computing both the scores.

NOTE In this task, we compute the above mentioned scores for each of the behaviors, and the final score is the mean of the scores across all the behaviors in the test set.

πŸ† Prizes

The cash prize pool across the 3 tasks is $9000 USD total (Sponsored by Amazon and Northwestern)

For each task, the prize pool is as follows. Prizes will be awared for all the 3 tasks

  • πŸ₯‡ 1st on leaderboard: $1500 USD
  • πŸ₯ˆ 2nd on the leaderboard: $1000 USD
  • πŸ₯‰ 3rd on the leaderboard: $500 USD


Additionally, Amazon is sponsoring $10000 USD total of SageMaker credits! πŸ˜„ 

Please check out this post to see how to claim credits.


πŸ”— Links

πŸ“« Contact

πŸ™ Sponsors



See all
[Task 3] Learning New Behaviour [Baseline]
About 3 years ago