Loading
Round 0: Pre-Release: Completed Round 1: 34 days left

NeurIPS 2019 : MineRL Competition

Sample-efficient reinforcement learning in Minecraft

10 Travel Grants
Misc Prizes : 1x Titan RTX GPU

The MineRL Competition for Sample-Efficient Reinforcement Learning


Submissions Open!

We are so excited to announce that Round 1 of the MineRL NeurIPS 2019 Competition is now open for submissions! Our partners at AIcrowd just released their competition submission starter kit that you can find here.

Here’s how you submit in Round 1:

  1. Sign up to join the competition with the ‘Participate’ button above!

  2. Clone the AIcrowd starter template and start developing your submissions.

  3. Submit an agent to the leaderboard:

    • Train your agents locally (or on Azure) in under 8,000,000 samples over 4 days. Participants should use hardware no more powerful than NG6v2 instances on Azure (6 CPU cores, 112 GiB RAM, 736 GiB SDD, and a NVIDIA P100 GPU.)

    • Push your repository to AIcrowd GitLab, which verifies that it can successfully be re-trained by the organizers at the end of Round 1 and then runs the test entrypoint to evaluate the trained agent’s performance!

Once the full evaluation of the uploaded model/code is done, the your submission will appear on the leaderboard!

» Submit your first agent! «

Abstract

Although deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples. Many of these systems cannot be applied to real-world problems, where environment samples are expensive. Resolution of these limitations requires new, sample-efficient methods.

This competition is designed to foster the development of algorithms which can drastically reduce the number of samples needed to solve complex, hierarchical, and sparse environments using human demonstrations. Participants compete to develop systems which solve a hard task in Minecraft, obtaining a diamond, with a limited number of samples.

Task

Some of the stages of obtaining a diamond: obtaining wood, a stone pickaxe, iron, and diamond.

wood stonepick iron_ironpick diamond

This competition uses a set of Gym environments based on Malmo. The environment and dataset loader is available through a pip package. See here for documentation of the environment and accessing the data.

The task of the competition is solving the MineRLObtainDiamond environment. In this environment, the agent begins in a random starting location without any items, and is tasked with obtaining a diamond. This task can only be accomplished by navigating the complex item hierarchy of Minecraft.

item hierarchy

The agent receives a high reward for obtaining a diamond as well as smaller, auxiliary rewards for obtaining prerequisite items. In addition to the main environment, we provide a number of auxiliary environments. These consists of tasks which are either subtasks of ObtainDiamond or other tasks within Minecraft.

Why Minecraft?

Minecraft is a rich environment in which to perform learning: it is an open-world environment, has sparse rewards, and has many innate task hierarchies and subgoals. Furthermore, it encompasses many of the problems that we must solve as we move towards more general AI (for example, what is the reward structure of “building a house”?). Besides all this, Minecraft has more than 90 million monthly active users, making it a good environment on which to collect a large-scale dataset.

Competition Structure

Round 1: General Entry

Round 1 Procedure

In this round, teams of up to 6 individuals will do the following:

  1. Register on the AICrowd competition website and receive the following materials:

    • Starter code for running the environments for the competition task.
    • Basic baseline implementations provided by Preferred Networks and the competition organizers.
    • The human demonstration dataset with different renders (one for methods development, the other for validation) with modified textures, lighting conditions, and/or minor game state changes.
    • Docker Images and Azure quick-start template that the competition organizers will use to validate the training performance of the competitor’s models.
    • Scripts enabling the procurement of the standard cloud compute used to evaluate the sample-efficiency of participants’ submissions. Note that, for this competition we will specifically be restricting competitors to NC6 v2 Azure instances with 6 CPU cores, 112 GiB RAM, 736 GiB SDD, and a single NVIDIA P100 GPU.
  2. (Optional) Form a team using the ‘Create Team’ button on the competition overview. Participants must be signed in to create a team.

  3. Use the provided human demonstrations to develop and test procedures for efficiently training models to solve the competition task.

  4. Train their models against MineRLObtainDiamond-v0 using the local training/azure training scripts in the competition starter template with only 8,000,000 samples in less than four days using hardware no powerful than a NG6v2 instance (6 CPU cores, 112 GiB RAM, 736 GiB SDD, and a single NVIDIA P100 GPU.)

  5. Submit their trained models for evaluation when satisfied with their models. The automated evaluation setup will evaluate the submissions against the validation environment, to compute and report the metrics on the leaderboard of the competition.

Once Round 1 is complete, the organizers will:

  1. Examine the code repositories of the top submissions on the leaderboard to ensure compliance with the competition rules. The top submissions which comply with the competition rules will then automatically be re-trained by the competition orchestration platform.

  2. Evaluate the resulting models again over several hundred episodes to determine the final ranking.

The code repositories associated with the corresponding submissions will be forked and scrubbed of any files larger than 15MB to ensure that participants are not using any pre-trained models in the subsequent round.

Round 2: Finals

Round 2 Procedure

In this round, the top 10 performing teams will continue to develop their algorithms. Their work will be evaluated against a confidential, held-out test environment and test dataset, to which they will not have access.

Participants will be able to make a submission four times during Round 2. For each submission, the automated evaluator will train their procedure on the held out test dataset and simulator, evaluate the trained model, and report the score and metrics back to the participants. The final ranking for this round will be based on the best-performing submission by each team.

Funding Opportunities and Resources

Through our generous sponsor, Microsoft, we will provide some compute grants for teams that self identify as lacking access to the necessary compute power to participate in the competition. We will also provide groups with the evaluation resources for their experiments in Round 2.

The competition team is committed to increasing the participation of groups traditionally underrepresented in reinforcement learning and, more generally, in machine learning (including, but not limited to: women, LGBTQ individuals, individuals in underrepresented racial and ethnic groups, and individuals with disabilities). To that end, we will offer Inclusion@NeurIPS scholarships/travel grants for some number of Round 1 participants who are traditionally underrepresented at NeurIPS to attend the conference. We also plan to provide travel grants to enable all of the top participants from Round 2 to attend our NeurIPS workshop.

The application for the Inclusion@NeurIPS travel grants can be found here.

~~The application for the compute grants can be found here.~~ Compute grant application is closed!

Prizes

The first place team in round 2 will receive a Titan RTX GPU curtsy of NVIDIA. The top three teams in round 2 will receive travel grants to attend NeruIPS

Important Dates

May 10, 2019: Applications for Grants Open. Participants can apply to receive travel grants and/or compute grants.

Jun 8, 2019: First Round Begins. Participants invited to download starting materials and to begin developing their submission.

Jun 26, 2019: Application for Compute Grants Closes. Participants can no longer apply for compute grants.

Jul 8, 2019: Notification of Compute Grant Winners. Participants notified about whether they have received a compute grant.

Oct 1, 2019 (UTC 23:00): _Inclusion@NeurIPS Travel Grant Application Closes _. Participants can no longer apply for travel grants.

Oct 9, 2019 Travel Grant Winners Notified. Winners of Inclusion@NeurIPS travel grants are notified.

~~Sep 22, 2019~~ Oct 25, 2019 (UTC 12:00): First Round Ends. Submissions for consideration into entry into the final round are closed. Models will be evaluated by organizers and partners.

~~Sep 27, 2019~~ Oct 30, 2019: First Round Results Posted. Official results will be posted notifying finalists.

Nov 1, 2019: Final Round Begins. Finalists are invited to submit their models against the held out validation texture pack to ensure their models generalize well.

Nov 25, 2019: Final Round Closed. Submissions for finalists are closed, evaluations are completed, and organizers begin reviewing submissions.

Dec 6, 2019: Special Awards Posted. Additional awards granted by the advisory committee are posted.

Dec 6, 2019: Final Round Results Posted. Official results of model training and evaluation are posted.

Dec 8, 2019: NeurIPS 2019! All Round 2 teams invited to the conference to present their results.

Important Links

Team

The organizing team consists of:

  • William H. Guss (Carnegie Mellon University)
  • Mario Ynocente Castro (Preferred Networks)
  • Cayden Codel (Carnegie Mellon University)
  • Katja Hofmann (Microsoft Research)
  • Brandon Houghton (Carnegie Mellon University)
  • Noboru Kuno (Microsoft Research)
  • Crissman Loomis (Preferred Networks)
  • Stephanie Milani (Carnegie Mellon University)
  • Sharada Mohanty (AIcrowd)
  • Keisuke Nakata (Preferred Networks)
  • Diego Perez Liebana (Queen Mary University of London)
  • Ruslan Salakhutdinov (Carnegie Mellon University)
  • Shinya Shiroshita (Preferred Networks)
  • Nicholay Topin (Carnegie Mellon University)
  • Avinash Ummadisingu (Preferred Networks)
  • Manuela Veloso (Carnegie Mellon University)
  • Phillip Wang (Carnegie Mellon University)

The advisory committee consists of:

  • Chelsea Finn (Google Brain and UC Berkeley)
  • Sergey Levine (UC Berkeley)
  • Harm van Seijen (Microsoft Research)
  • Oriol Vinyals (Google DeepMind)

Contact

If you have any questions, please feel free to contact us: