Loading
Round 0: Sign-ups: 7 days

NeurIPS 2019 : MineRL Competition

Sample-efficient reinforcement learning in Minecraft, Starting June 1st, 2019

10 Travel Grants
Misc Prizes : NVIDIA: 3x GPU + More coming soon!

The MineRL Competition for Sample-Efficient Reinforcement Learning

Starting June 1st, 2019

Abstract

Although deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples. Many of these systems cannot be applied to real-world problems, where environment samples are expensive. Resolution of these limitations requires new, sample-efficient methods.

This competition is designed to foster the development of algorithms which can drastically reduce the number of samples needed to solve complex, hierarchical, and sparse environments using human demonstrations. Participants compete to develop systems which solve a hard task in Minecraft, obtaining a diamond, with a limited number of samples.

Task

Some of the stages of obtaining a diamond: obtaining wood, a stone pickaxe, iron, and diamond.

wood stonepick iron_ironpick diamond

This competition uses a set of Gym environments based on Malmo. The environment and dataset loader will be available through a pip package.

The main task of the competition is solving the ObtainDiamond environment. In this environment, the agent begins in a random starting location without any items, and is tasked with obtaining a diamond. This task can only be accomplished by navigating the complex item hierarchy of Minecraft.

item hierarchy

The agent receives a high reward for obtaining a diamond as well as smaller, auxiliary rewards for obtaining prerequisite items. In addition to the main environment, we provide a number of auxiliary environments. These consists of tasks which are either subtasks of ObtainDiamond or other tasks within Minecraft.

Why Minecraft?

Minecraft is a rich environment in which to perform learning: it is an open-world environment, has sparse rewards, and has many innate task hierarchies and subgoals. Furthermore, it encompasses many of the problems that we must solve as we move towards more general AI (for example, what is the reward structure of “building a house”?). Besides all this, Minecraft has more than 90 million monthly active users, making it a good environment on which to collect a large-scale dataset.

Competition Structure

Round 1: General Entry

Round 1 Procedure

In this round, participants will do the following:

  1. Register on the AICrowd competition website and receive the following materials:

    • Starter code for running the environments for the competition task.
    • Basic baseline implementations provided by Preferred Networks and the competition organizers.
    • Different renders of the human demonstration dataset (one for methods development, the other for validation) with modified textures, lighting conditions, and/or minor game state changes.
    • The Docker Images and Azure quick-start template that the competition organizers will use to validate the training performance of the competitor’s models.
    • Several scripts enabling the procurement of the standard cloud compute used to evaluate the sample-efficiency of participants’ submissions. Note that, for this competition we will specifically be restricting competitors to NC6 v2 Azure instances with 6 CPU cores, 112 GiB RAM, 736 GiB SDD, and a single NVIDIA P100 GPU.
  2. Use the provided human demonstrations to develop and test procedures for efficiently training models to solve the competition task.

  3. Submit their trained models for evaluation when satisfied with their models. The automated evaluation setup will evaluate the submissions against the validation environment, to compute and report the metrics on the leaderboard of the competition.

Once Round 1 is complete, the organizers will:

  1. Examine the code repositories of the top submissions on the leaderboard to ensure compliance with the competition rules. The top submissions which comply with the competition rules will then automatically be re-trained by the competition orchestration platform.

  2. Evaluate the resulting models again over several hundred episodes to determine the final ranking.

The code repositories associated with the corresponding submissions will be forked and scrubbed of any files larger than 15MB to ensure that participants are not using any pre-trained models in the subsequent round.

Round 2: Finals

Round 2 Procedure

In this round, the top 10 performing teams will continue to develop their algorithms. Their work will be evaluated against a confidential, held-out test environment and test dataset, to which they will not have access.

Participants will be able to make a submission four times during Round 2. For each submission, the automated evaluator will train their procedure on the held out test dataset and simulator, evaluate the trained model, and report the score and metrics back to the participants. The final ranking for this round will be based on the best-performing submission by each team.

Funding Opportunities and Resources

Through our generous sponsor, Microsoft, we will provide some compute grants for teams that self identify as lacking access to the necessary compute power to participate in the competition. We will also provide groups with the evaluation resources for their experiments in Round 2.

The competition team is committed to increasing the participation of groups traditionally underrepresented in reinforcement learning and, more generally, in machine learning (including, but not limited to: women, LGBTQ individuals, underrepresented racial and ethnic minorities, and individuals with disabilities). To that end, we will offer Inclusion@NeurIPS scholarships/travel grants for some number of Round 1 participants who are traditionally underrepresented at NeurIPS to attend the conference. We also plan to provide travel grants to enable all of the top participants from Round 2 to attend our NeurIPS workshop.

The application for the Inclusion@NeurIPS travel grants can be found here.

The application for the compute grants can be found here.

Prizes

Top-ranking teams in round 2 will receive rewards from our sponsors. Details will be announced as we finalize agreements. Currently, Nvidia will be distributing three GPUs among the top teams.

Important Dates

May 10, 2019: Applications for Grants Open. Participants can apply to receive travel grants and/or compute grants.

Jun 1, 2019: First Round Begins. Participants invited to download starting materials and baseline and to begin developing their submission.

Jun 15, 2019: Application for Compute Grants Closes. Participants can no longer apply for compute grants.

Jun 25, 2019: Notification of Compute Grant Winners. Participants notified about whether they have received a compute grant.

Sep 22, 2019: First Round Ends. Submissions for consideration into entry into the final round are closed. Models will be evaluated by organizers and partners.

Sep 27, 2019: First Round Results Posted. Official results will be posted notifying finalists.

Sep 30, 2019: Final Round Begins. Finalists are invited to submit their models against the held out validation texture pack to ensure their models generalize well.

Oct 25, 2019: Final Round Ends and Inclusion@NeurIPS Travel Grant Application Closes. Submissions for finalists are closed and organizers begin training finalists latest submission for evaluation. Participants can no longer apply for travel grants.

Nov 12, 2019: Final Round Results Posted and Travel Grant Winners Notified. Official results of model training and evaluation are posted. Winners of Inclusion@NeurIPS travel grants are notified.

Dec 1, 2019: Special Awards Posted. Additional awards granted by the advisory committee are posted.

Dec 8, 2019: NeurIPS 2019! Winning teams invited to the conference to present their results.

Important Links

Team

The organizing team consists of:

  • William H. Guss (Carnegie Mellon University)
  • Cayden Codel (Carnegie Mellon University)
  • Katja Hofmann (Microsoft Research)
  • Brandon Houghton (Carnegie Mellon University)
  • Noboru Kuno (Microsoft Research)
  • Stephanie Milani (University of Maryland, Baltimore County and Carnegie Mellon University)
  • Sharada Mohanty (AICrowd)
  • Diego Perez Liebana (Queen Mary University of London)
  • Ruslan Salakhutdinov (Carnegie Mellon University)
  • Nicholay Topin (Carnegie Mellon University)
  • Manuela Veloso (Carnegie Mellon University)
  • Phillip Wang (Carnegie Mellon University)

The advisory committee consists of:

  • Chelsea Finn (Google Brain and UC Berkeley)
  • Sergey Levine (UC Berkeley)
  • Harm van Seijen (Microsoft Research)
  • Oriol Vinyals (Google DeepMind)

Contact

If you have any questions, please feel free to contact us: