⚔️ Problem statement
In this Task, you will be given a dataset of video data consisting of a beetle (Sceptobius lativentris) interacting with another interactor (an ant, or a different species of beetle). Rather than being asked to detect a specific behavior of interest, we ask you to submit a frame-by-frame representation of the dataset—for example, a low-dimensional embedding of the videos or trajectories over time. The inspiration is that raw video will provide participants freedom to try ideas that they could not on just the tracking data. (For inspiration, you can read about a few existing methods for embedding behavior of individual animals here, here, here, and here.)
To evaluate the quality of your learned representations, we will take a practical approach: we'll use representations as input to train linear classifiers on many different "hidden" tasks (each task will have its own classifier), such as detecting occurrence of experimenter-defined actions or distinguishing between two different interactors. The goal is therefore to create a representation that captures behavior and generalizes well in any downstream task.
Join our Computational Behavior Slack to discuss the challenge, ask questions, find teammates, or chat with the organizers!
The following files are available in the
Resources section on the Challenge Page. A "sequence" is a continuous recording of social interactions between animals: sequences are 30 seconds long (900 frames at 30Hz) in the beetle dataset. The
sequence_id is a random hash to anonymize experiment details. nans indicate missing data. These occur because not all videos are labelled for all tasks. Data are padded with nans to be all the same size.
user_train.npy- Set of videos where three public tasks are provided, for your local validation, which follows the following schema :
submission_keypoints.npy- Keypoints for the submission clips, which follows the following schema :
frame_number_map.npy- A map of frame numbers for each clip to be used for the submission embeddings array
sample_submission.npy- Template for a sample submission for this task, follows the following schema :
userTrain_videos.zip- Videos for the userTrain sequences, all 512x512 Grayscale 30 fps - 900 frames each
submission_videos.zip- Videos for the Submission sequences, all 512x512 Grayscale 30 fps - 900 frames each
userTrain_videos_resized_224.zip- Videos resized for convenience, 224x224 Grayscale 30 fps - 900 frames each
submission_videos_resized_224.zip- Videos resized for convenience, 224x224 Grayscale 30 fps - 900 frames each
sample_submission, each key in the
frame_number_map dictionary refers to the unique sequence id of a video in the test set. The item for each key is expected to be an the start and end index for slicing the
embeddings numpy array to get the corresponding embeddings. The
embeddings array is a 2D
ndarray of floats of size
X , where
X is the dimension of your learned embedding (6 in the above example; maximum permitted embedding dimension is 128), representing the embedded value of each frame in the sequence.
total_frames is the sum of all the frames of the sequences, the array should be concatenation of all the embeddings of all the clips.
To give you a hint of how embeddings will be evaluated, we provide labels for three sample evaluation tasks. Note that not all frames/sequences will be labeled for a given task:
NaNs are used to indicate missing data that we will not evaluate on for a given task.
- Reapplied is a binary "sequence-level" task, meaning its value is the same for all frames in a sequence. Here, a value of 1 means the interactor is a reapplied ant: a chemically stripped ant with pheromone extract reapplied onto it; a value of 0 means the interactor is not a reapplied ant.
- grooming_self is a "frame-level" task, meaning each frame in a sequence will be 1 when behavior is present and 0 otherwise. The annotation is performed by a domain expert on each frame for a subset of the videos for when the beetle is grooming itself.
- exploring object is a "frame-level" task, meaning each frame in a sequence will be 1 when behavior is present and 0 otherwise. The annotation is performed by a domain expert on each frame for a subset of the videos for when the beetle is exploring the interactor.
The sample annotations for these three tasks are provided in a matrix that is (
# frames) × 3. For all tasks, annotations are provided for each frame of a sequence.
The sample submission format is described in the Files section above.
To test out the system, you can start by uploading the provided
sample_submission.npy. When you make your own submissions, they should follow the same format.
Also check out the notebooks provided in the
Notebooks tab for baselines provided by us. Your community contributions will also be shown in this section.
The cash prize pool for this task is $3,000 USD total:
- 🥇 1st on leaderboard: $1500 USD
- 🥈 2nd on the leaderboard: $1000 USD
- 🥉 3rd on the leaderboard: $500 USD
Additional prizes to be announced, including speaker opportunity at our CVPR 2022 workshop.
Unsupervised model - SimCLR - Ant-Beetles Video Data
Getting Started - Ant-Beetles Video Data