NeurIPS 2019 : MineRL Competition
Sample-efficient reinforcement learning in Minecraft
The MineRL Competition for Sample-Efficient Reinforcement Learning
Welcome: Competition Launched
We are excited to announce the beginning of the MineRL competition! Documentation of the environment and accessing the data is available at http://www.minerl.io/docs/
This is the first ever publishing of the minerl package, so we are paying special attention to feedback from the community during this exploratory period. Please note bugs may be encountered. As a result submissions are not yet possible.
Through our generous partner Preferred Networks, we are working to release a set of benchmarks for the environment. A release date will be announced soon!
To get started with the environment and dataset please check out our quick start guide here!
To report bugs in the environment or dataset please use our Github issue tracker!
The rules of the competition are available here
Although deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples. Many of these systems cannot be applied to real-world problems, where environment samples are expensive. Resolution of these limitations requires new, sample-efficient methods.
This competition is designed to foster the development of algorithms which can drastically reduce the number of samples needed to solve complex, hierarchical, and sparse environments using human demonstrations. Participants compete to develop systems which solve a hard task in Minecraft, obtaining a diamond, with a limited number of samples.
Some of the stages of obtaining a diamond: obtaining wood, a stone pickaxe, iron, and diamond.
This competition uses a set of Gym environments based on Malmo. The environment and dataset loader will be available through a pip package.
The main task of the competition is solving the ObtainDiamond environment. In this environment, the agent begins in a random starting location without any items, and is tasked with obtaining a diamond. This task can only be accomplished by navigating the complex item hierarchy of Minecraft.
The agent receives a high reward for obtaining a diamond as well as smaller, auxiliary rewards for obtaining prerequisite items. In addition to the main environment, we provide a number of auxiliary environments. These consists of tasks which are either subtasks of ObtainDiamond or other tasks within Minecraft.
Minecraft is a rich environment in which to perform learning: it is an open-world environment, has sparse rewards, and has many innate task hierarchies and subgoals. Furthermore, it encompasses many of the problems that we must solve as we move towards more general AI (for example, what is the reward structure of “building a house”?). Besides all this, Minecraft has more than 90 million monthly active users, making it a good environment on which to collect a large-scale dataset.
Round 1: General Entry
Team feature is not yet supported - please register individually until the team feature is announced
In this round, teams of up to 6 individuals will do the following:
Register on the AICrowd competition website and receive the following materials:
- Starter code for running the environments for the competition task.
- Basic baseline implementations provided by Preferred Networks and the competition organizers.
- Different renders of the human demonstration dataset (one for methods development, the other for validation) with modified textures, lighting conditions, and/or minor game state changes.
- The Docker Images and Azure quick-start template that the competition organizers will use to validate the training performance of the competitor’s models.
- Several scripts enabling the procurement of the standard cloud compute used to evaluate the sample-efficiency of participants’ submissions. Note that, for this competition we will specifically be restricting competitors to NC6 v2 Azure instances with 6 CPU cores, 112 GiB RAM, 736 GiB SDD, and a single NVIDIA P100 GPU.
Use the provided human demonstrations to develop and test procedures for efficiently training models to solve the competition task.
Submit their trained models for evaluation when satisfied with their models. The automated evaluation setup will evaluate the submissions against the validation environment, to compute and report the metrics on the leaderboard of the competition.
Once Round 1 is complete, the organizers will:
Examine the code repositories of the top submissions on the leaderboard to ensure compliance with the competition rules. The top submissions which comply with the competition rules will then automatically be re-trained by the competition orchestration platform.
Evaluate the resulting models again over several hundred episodes to determine the final ranking.
The code repositories associated with the corresponding submissions will be forked and scrubbed of any files larger than 15MB to ensure that participants are not using any pre-trained models in the subsequent round.
Round 2: Finals
In this round, the top 10 performing teams will continue to develop their algorithms. Their work will be evaluated against a confidential, held-out test environment and test dataset, to which they will not have access.
Participants will be able to make a submission four times during Round 2. For each submission, the automated evaluator will train their procedure on the held out test dataset and simulator, evaluate the trained model, and report the score and metrics back to the participants. The final ranking for this round will be based on the best-performing submission by each team.
Funding Opportunities and Resources
Through our generous sponsor, Microsoft, we will provide some compute grants for teams that self identify as lacking access to the necessary compute power to participate in the competition. We will also provide groups with the evaluation resources for their experiments in Round 2.
The competition team is committed to increasing the participation of groups traditionally underrepresented in reinforcement learning and, more generally, in machine learning (including, but not limited to: women, LGBTQ individuals, underrepresented racial and ethnic minorities, and individuals with disabilities). To that end, we will offer Inclusion@NeurIPS scholarships/travel grants for some number of Round 1 participants who are traditionally underrepresented at NeurIPS to attend the conference. We also plan to provide travel grants to enable all of the top participants from Round 2 to attend our NeurIPS workshop.
The application for the Inclusion@NeurIPS travel grants can be found here.
The application for the compute grants can be found here.
Top-ranking teams in round 2 will receive rewards from our sponsors. Details will be announced as we finalize agreements. Currently, Nvidia will be distributing three GPUs among the top teams.
May 10, 2019: Applications for Grants Open. Participants can apply to receive travel grants and/or compute grants.
Jun 8, 2019: First Round Begins. Participants invited to download starting materials and to begin developing their submission.
Jun 26, 2019 (11:00PM EST): Application for Compute Grants Closes. Participants can no longer apply for compute grants.
Jul 8, 2019: Notification of Compute Grant Winners. Participants notified about whether they have received a compute grant.
Sep 22, 2019: First Round Ends. Submissions for consideration into entry into the final round are closed. Models will be evaluated by organizers and partners.
Sep 27, 2019: First Round Results Posted. Official results will be posted notifying finalists.
Sep 30, 2019: Final Round Begins. Finalists are invited to submit their models against the held out validation texture pack to ensure their models generalize well.
Oct 1, 2019 Inclusion@NeurIPS Travel Grant Application Closes (11:00PM EST). Participants can no longer apply for travel grants.
Oct 9, 2019 Travel Grant Winners Notified. Winners of Inclusion@NeurIPS travel grants are notified.
Oct 25, 2019: Final Round Ends. Submissions for finalists are closed and organizers begin training finalists latest submission for evaluation.
Nov 12, 2019: Final Round Results Posted. Official results of model training and evaluation are posted.
Dec 1, 2019: Special Awards Posted. Additional awards granted by the advisory committee are posted.
Dec 8, 2019: NeurIPS 2019! Winning teams invited to the conference to present their results.
~~MineRL Compute Grant Application~~ Compute grant application is closed!
The organizing team consists of:
- William H. Guss (Carnegie Mellon University)
- Mario Ynocente Castro (Preferred Networks)
- Cayden Codel (Carnegie Mellon University)
- Katja Hofmann (Microsoft Research)
- Brandon Houghton (Carnegie Mellon University)
- Noboru Kuno (Microsoft Research)
- Crissman Loomis (Preferred Networks)
- Stephanie Milani (Carnegie Mellon University)
- Sharada Mohanty (AIcrowd)
- Keisuke Nakata (Preferred Networks)
- Diego Perez Liebana (Queen Mary University of London)
- Ruslan Salakhutdinov (Carnegie Mellon University)
- Shinya Shiroshita (Preferred Networks)
- Nicholay Topin (Carnegie Mellon University)
- Avinash Ummadisingu (Preferred Networks)
- Manuela Veloso (Carnegie Mellon University)
- Phillip Wang (Carnegie Mellon University)
The advisory committee consists of:
- Chelsea Finn (Google Brain and UC Berkeley)
- Sergey Levine (UC Berkeley)
- Harm van Seijen (Microsoft Research)
- Oriol Vinyals (Google DeepMind)
If you have any questions, please feel free to contact us: