AIcrowd | MeltingPot Challenge 2023

Round 1: Completed

AIcrowd &

Cooperative AI Foundation

49.8k

826

117

672

💻 Apply for compute credits - Deadline to apply - September 22nd (Check country eligibility for applying)

📕 Make your first submission using the starter-kit.

💪 Train your own models easily with the rllib baseline setup

👥 Find teammates

💬 Share feedback & queries

We recently announced updates to logistical and evaluation aspects of the contest. Please check them out here.

🪴 Introduction

We are excited to announce the Melting Pot Contest at NeurIPS 2023, organised by researchers from the Cooperative AI Foundation, MIT, and Google DeepMind. This new competition challenges researchers to push the boundaries of multi-agent reinforcement learning (MARL) for mixed-motive cooperation. The contest leverages the cutting-edge Melting Pot environment suite to rigorously evaluate how well agents can adapt their cooperative skills to interact with novel partners in unforeseen situations. Success requires demonstrating true cooperative intelligence. This event offers participants a unique opportunity to tackle open challenges in cooperative AI and to advance the frontier of deployable real-world AI systems.

Dive deep into the realm of multi-agent reinforcement learning and explore the intricacies of mixed-motive cooperation.

📑 Tasks

Your task is to build creative MARL solutions focused on achieving goals through teamwork, teaching, bargaining, and sanctioning undesirable behaviour. The contest scenarios are designed to elicit cooperation, coordination, reciprocity, and other prosocial behaviours. Success requires agents that balance their individual interests with behaviours that benefit the group, even in the face of unfamiliar partners.

The task is a robust testbed for advancing research on mixed individual motives, generalising populations, and large-scale multi-agent interactions.

The aim of this challenge is to catalyse progress in deployable cooperative AI that complements human prosperity. Together we can build consensus on metrics for coordination, engage the broader AI community, and showcase AI’s immense potential for teamwork.

💾 Melting Pot Suite

The Melting Pot suite provides a protocol for generating test scenarios to assess an agent population's ability to generalise cooperation to new situations. Scenarios combine novel "background populations" of agents with substrates including classic social dilemmas like the Prisoner’s Dilemma and more complex mixed-motive coordination games. The contest features settings with up to sixteen simultaneous agents across four diverse environments, enabling robust evaluation of cooperative capabilities at scale.

Participants can build on provided baselines, debugging tools for local testing, and visualisations to interpret agent behaviours. The substrates are configurable 2D gridworlds based on DeepMind Lab that strike a balance between visual complexity and computational efficiency. Agents have only partial observability, introducing challenges for communication and conventions. Together these elements make Melting Pot an accessible yet uniquely rigorous testbed to push MARL research to new heights.

For this contest, we will specifically focus on the following four substrates:

allelopathic_harvest__open
clean_up
prisoners_dilemma_in_the_matrix__arena
territory__rooms

During the evaluation phase, submissions will be evaluated on held out scenarios. The final winners will be decided based on performance on these held out scenarios.

📅 Timeline (Updated!)

The contest timeline spans three months to allow participants to refine their solutions before final scoring:

(All times are in Anywhere On Earth)

August 31: Development phase begins
September 23: Compute Credits Application closes
September 25: Compute credits provided to the selected teams
November 18: Development/Generalization phase Ends (i.e. participants can no longer make submission, previous deadline was Nov 15)
November 16-18: Submission selection by participants for Evaluation
November 22-28: Evaluation Phase
November 29-30: Reach out to participants for any clarifications and other details about their solutions. Winners announced and notified.
December 15: NeurIPS (9am-noon CST)

We are committed to an accessible and rewarding experience through partnerships, grants, over 10,000inprizes,60,000 in support, and publishing opportunities. We will actively support participants through forums, tutorials, office hours, and more to create an engaging AI research adventure.

🏆 Prizes

In recognition of outstanding achievements, we have earmarked a prize pool of $10,000.

🥇 1st Prize: $5,000
🥈 2nd Prize: $3,000
🥉 3rd Prize: $2,000

Additionally, in our commitment to fostering inclusivity and diversity, we're offering up to 50,000worthcomputecreditsand10,000 in travel grants.

To apply for compute funding, please fill out this form (Please note that you are eligible to receive this funding only if you are applying from one of the countries listed in eligibility requirement on this link).

Beyond the monetary rewards, top performers will receive an invitation to co-author a report for NeurIPS 2024. The details will be shared soon.

🚀 Baselines

To ensure a smooth start, we are providing participants with baselines for training and local evaluation of RLIib based agents on Melting Pot. This will allow you to quickly train basic agents and make a submission using the submission-starter-kit to get on the leaderboard. Coupled with tools designed for debugging and visualization, you can establish a strong foundation and then build upon it to reach unprecedented heights.

Notes for using the baseline:

It is recommended that participants use the melting pot fork provided as a part of the baseline implementation so as to work on a frozen version and better track any issues that you may face during the contest.
Please open issues on the baseline repository directly for any issues with melting pot code or the RLIib implementations.
We plan to provide additional support to the baseline implementation over the next couple of weeks. Hence, it would be useful to check in an sync the fork every few days over next two weeks, especially if you plan to build RLLib agents.

🖊 Evaluation Metric

Contestants submit populations for evaluation, one per substrate. A population is then evaluated on all the scenarios of that substrate. For each scenario we compute the focal per-capita reward.

To make the evaluation consistent across different substrates, focal per-capita rewards are normalized based on the MP2.0's baseline range where the worst performing baseline in a scenario is assigned a score of 0 and the best performing one a score of 1 .

A contestant's score in a scenario can be outside of this range if they perform worse than the worst baseline or better than the best baseline in MP2.0. This normalization process yields the ‘scenario score’. Subsequently, the population score in a substrate is computed as the average of its scenario scores.Given that each substrate consists of a varying number of scenarios, scores are averaged for each substrate.

The final score, which determines the ranking of participants, is derived by averaging the scores across all four substrates.

Contest Updates

We are announcing a few logistical updates to accommodate more development time for the participants and address some of the questions/concerns that have been raised. We are further announcing some structural updates to the evaluation phase so as to provide better visibility and understanding of what participants can expect during and after the evaluation phase.

Please reach out to us via discord channel if you have any questions.

Logistical Updates

Updated Timeline: Please check the updated Timeline above.

Updated Team Size: We have increased the allowed team size to 10 members (previous limit was 5)

Updated Leaderboard: The leaderboard will no longer show the sample videos of submissions to avoid revealing behaviors achieved by participants. The participants can still use local evaluation modules to generate local videos. We are currently looking into the ability to render participant specific videos on the AICrowd platform in a restricted manner such that participants only view videos related to their own submission

Evaluation Scheme

During the evaluation phase, the submissions will be evaluated on held-out scenarios not accessible to the participants. Below we elaborate on the updated structure of the evaluation process and provide step-by-step guidelines on how it affects the participants:

Development Phase (now-Oct 31): Participants can make 2 submissions per day. The leaderboard and submission details will display the validation score (mean return of focal population for the validation scenarios already shared with the participants)
Generalization Phase (Extension of Development Phase, Nov 1- Nov 10): Participants can continue to make 2 submissions per day. In addition to the validation score mentioned above, the leaderboard and submission details will also display a generalization score. The generalization score is also the mean focal population score but on a new set of validation scenarios that is known to have the same distribution as the held-out test scenarios. The participants will not have access to these scenario samples, only their scores will be reported. As these samples belong to the same distribution that will be used to perform final evaluation and ranking, we believe that this will provide the participants with valuable information into how their approaches might perform on the held-out test set.
Submission Selection Phase (Nov 11 - Nov 13): Participants can select up to three submission IDs from the submissions they have made until Nov 10. We will perform the final evaluation using the selected submission ids and consider the best of three scores for final ranking. Participants will not be allowed to make further submissions during this phase. **If the participant fails to select three ids of their choice by Nov 13, we will use their *top three* submissions (based on validation score) to perform evaluation and final scoring.**
Evaluation Phase (Nov 14 - Nov 20): We will evaluate all the selected submissions on the held-out test set during this phase. The participants will not receive any information during this phase except in case of errors. If we get errors with any submissions, we may choose to reach out to participants if we think the error is resolvable. Please note that if any one or two of the three selected submission IDs runs without error, we will use that score for our ranking even if the remaining submission ids throw some error.

🙋Advisory Board

We have assembled a broad team of researchers and engineers to organise and run the Melting Pot Contest at NeurIPS 2023, as well as an advisory board of leading academics.

Organisers

Rakshit Trivedi - MIT
Akbir Khan - UCL and Cooperative AI Foundation
Jesse Clifton - Center on Long-Term Risk and Cooperative AI Foundation
Lewis Hammond - University of Oxford and Cooperative AI Foundation
Joel Leibo - Google Deepmind
Edgar Duenez-Guzman - Google Deepmind
John Agapiou - Google DeepMind
Jayd Matyas - Google Deepmind
Dylan Hadfield-Menell - MIT

Advisory Board

Vincent Conitzer - Professor at Carnegie Mellon University
Jakob Foerster - Associate Professor at University of Oxford
Natasha Jaques - Assistant Professor at University of Washington
José Hernández-Orallo - Professor at Universitat Politècnica de València

📱 Contact

💬 Share your feedback and suggestions over here.
👥 Challenges are better with friends, find teammates over here.
📲 Meet other challenge participants like you on the discord channel.