Round 0: Sign Up: Completed Round 1: Completed Round 2: Completed #neurips #reinforcement_learning
50.1k
719
95
490

#### SUBMISSIONS OPEN!

We are so excited to announce that Round 1 of the MineRL NeurIPS 2020 Competition is now open for submissions! The competition submission starter kit can be found here.

## 🕵️ Introduction

The MineRL 2020 Competition aims to foster the development of algorithms which can efficiently leverage human demonstrations to drastically reduce the number of samples needed to solve complex, hierarchical, and sparse environments.

To that end, participants will compete to develop systems that can obtain a diamond in Minecraft from raw pixels using only 8,000,000 samples from the MineRL simulator and 4 days of training on a single GPU machine. Participants will be provided the MineRL-v0 dataset (website, paper), a large-scale collection of over 60 million frames of human demonstrations, enabling them to utilize expert trajectories to minimize their algorithm’s interactions with the Minecraft simulator. More detailed background on the competition and its design can be found in the MineRL 2020: NeurIPS Competition Proposal.

The task of the competition is solving the MineRLObtainDiamondVectorObf-v0 environment. In this environment, the agent begins in a random starting location without any items, and is tasked with obtaining a diamond. This task can only be accomplished by navigating the complex item hierarchy of Minecraft.

The agent receives a high reward for obtaining a diamond as well as smaller, auxiliary rewards for obtaining prerequisite items. In addition to the main environment, we provide a number of auxiliary environments. These consist of tasks which are either subtasks of ObtainDiamond or other tasks within Minecraft.

## 📜 Rules

Participants will submit agents (round 1) and the code to train them from scratch (round 2) to AICrowd. Agents must train in under 8,000,000 samples from the environment on at most 1 P100 GPU for at most 4 days of training. The submissions must train a machine learning model without relying on human domain knowledge (no hardcoding, no manual specification of meta-actions e.g. move forward then dig down, etc). Participants can use the provided MineRL-v0 dataset of human demonstrations, but no external datasets.

A full comprehensive set of rules can be found here. Additionally, the clarifications provided in the FAQ (link) also constitute rules.

## 🖊 Evaluation

A submission’s score is the average total reward across all of its evaluation episodes.

During Round 1, submissions will be evaluated as they are received, and the resulting score will be added to the leaderboard. At the end of the round, competitors’ submissions will be retrained, and teams with a significantly lower score after retraining will be dropped from Round 1.

During Round 2, teams can make a number of submissions, each of which will be re-trained and evaluated as they are received. Each team’s leaderboard position is determined by the maximum score across its submissions in Round 2.

## 🆕 What's New!?

This competition is the second iteration of the MineRL 2019 Competition and we’ve introduced several new changes.

1. Action and Observation Obfuscation. Last year, competitors interacted with the MineRL<x>-v0 environments by creating agents that produced various human-readable actions. For example, MineRLNavigate-v0 takes as an action:

{'attack': 1,
'back': 0,
'camera': array([-86.02688 , -12.784636], dtype=float32),
'forward': 0,
'jump': 1,
'left': 1,
'right': 0,
'sneak': 1,
'sprint': 0}

This enabled submissions to apply their human priors to improving their algorithms (e.g. always move forward). As a result, many submissions were not generally applicable to different domains.

This year we are pushing competitors forward by introducing action and observation space obfuscation. The technique works by taking the human readable action for all of the environments, vectorizing it, and then passing it through a random auto-encoder private to the evaluator. This produces a simple environment with a continuous action vector and a continuous observation vector (with the exception of POV pixel observations, of course). In addition to simplification, no hard-coding of human priors can be applied because this embedding will change in Round 2, remaining hidden to the finalists.

Here is an example of a human trajectory encoded into this new obfuscated space. Note that the auto-encoder will be shared across all of the competition environments, so pretraining done on Treechop, for example, will apply to ObtainDiamond.
2. Survival Data Release. We will be releasing the much larger MineRL survival dataset which contains nearly 75% more data than the original dataset. This data will be encoded using the same encoder as above.
3. More changes soon! Stay tuned for more changes as they are released.

## 📁 Competition Structure

### Tracks

This year, we will have two competition tracks!

The primary track is “Demonstrations and Environment.” Competitors in this track may use the MineRL dataset and eight million interactions with the environment.

The secondary track is “Demonstrations Only.” No environment interactions may be used in addition to the provided dataset. Competitors interested in learning solely from demonstrations can compete in this track without being disadvantaged compared to those who also use reinforcement learning.

A team can submit separate entries to both tracks. Performance in the tracks will be evaluated separately.

### Round 1: General Entry

In this round, teams of up to 6 individuals will do the following:

1. Register on the AICrowd competition website and receive the following materials:
1. Starter code for running the environments for the competition task.
2. Baseline implementations provided by Preferred Networks and the competition organizers.
3. The human demonstration dataset with different renders (one for methods development, the other for validation) with modified textures, lighting conditions, and/or minor game state changes.
4. Docker Images and a quick-start template that the competition organizers will use to validate the training performance of the competitor’s models.
5. Scripts enabling the procurement of the standard cloud compute system used to evaluate the sample-efficiency of participants’ submissions.
6. (Optional) Form a team using the ‘Create Team’ button on the competition overview. Participants must be signed in to create a team.
2. Use the provided human demonstrations to develop and test procedures for efficiently training models to solve the competition task.
3. Train their models against MineRLObtainDiamondComp-v0 using the local training/azure training scripts in the competition starter template with only 8,000,000 samples in less than four days.Submit their trained models for evaluation when satisfied with their models. The automated evaluation setup will evaluate the submissions against the validation environment, to compute and report the metrics on the leaderboard of the competition.
4. Repeat 2-4 until Round 1 is complete!

Once Round 1 is complete, the organizers will:

• Examine the code repositories of the top submissions on the leaderboard to ensure compliance with the competition rules. The top submissions which comply with the competition rules will then automatically be re-trained by the competition orchestration platform.
• Evaluate the resulting models again over several hundred episodes to determine the final ranking.
• Fork the code repositories associated with the corresponding submissions, and scrub them of any files larger than 30MB to ensure that participants are not using any pre-trained models in the subsequent round.

### Round 2: Finals

In this round, the top 10 performing teams will continue to develop their algorithms. Their work will be evaluated against a confidential, held-out test environment and test dataset, to which they will not have access.

Participants will be able to make a submission four times during Round 2. For each submission, the automated evaluator will train their procedure on the held out test dataset and simulator, evaluate the trained model, and report the score and metrics back to the participants. The final ranking for this round will be based on the best-performing submission by each team.

## 💵 Prizes and Funding Opportunities

To be determined.

## 📅 Timeline

• July 1 - September 30: Round 1. Participants invited to download starting materials and baselines and to begin developing their submission.
• December 5th :Round 1 submissions close. Models evaluated by competition organizers. Official results posted.
• December 6th: NeurIPS 2020. Round 1 winning teams invited to the conference to present their results.
• December - January: Round 2. Finalist invited to submit their training code to be validated against the held out validation texture pack.
• January: Official results posted. Submissions close. Organizers train finalists latest submission for evaluation.
• February: AAAI 2021. Winning teams invited to the conference to present their results. Awards given at conference.

## 💪 Getting Started

Submissions are now open!  You can find the competition submission starter kit on GitHub here.

## 🙋 F.A.Q.

### This F.A.Q is the only official place for clarification of competition Rules!

Q: Do I need to purchase Minecraft to participate?

> A: No! MineRL includes a special version of Minecraft provided generously by the folks at Microsoft Research via Project Malmo.

Q: Which environments are used for scoring agents?

> A: MineRLObtainDiamondVectorObf-v0 is the ONLY evaluation environment for the 2020 MineRL competition.

Q: Can other environments be used for training agents?

> A: This year we explicitly restrict training to MineRL[...]VectorObf-v0 environments. This includes MineRLObtainDiamondVectorObf-v0, MineRLTreechopVectorObf-v0, and MineRLNavigateVectorObf-v0, as well as any other MineRL environment ending in VectorObf-v0.

These share an identical action and observation distribution with MineRLObtainDiamondVectorObf-v0 and thus may be used in pre-training, pre-processing, and training, provided training procedures comply with the competition rules.

Other environments will not be available when re-training on AIcrowd and should not be used when training locally.

Q: Can competitors add traditional computer vision components (e.g., edge detectors) to their systems?

> A: Yes, CV components trained as part of the overall pipeline are allowed. Competitors may tune their data augmentation / processing for Minecraft, but the approaches should be readily applicable to other environments (e.g., Atari games).

We seek to prevent the exploitation of Minecraft domain knowledge while not limiting the use of domain-agnostic approaches. If there is uncertainty about any specific approach, feel free to contact us for an answer about that specific approach.

Note that the evaluation environments use a different texture pack compared to the training environments, so do not over-tune based on the training texture pack.

Have more questions? Ask in Discord or on the Forum

## 🤝 Partners

Thank you to our amazing partners!

## 👥 Team

The organizing team consists of:

• William H. Guss (OpenAI and Carnegie Mellon University)
• Brandon Houghton (OpenAI and Carnegie Mellon University)
• Stephanie Milani (Carnegie Mellon University)
• Nicholay Topin (Carnegie Mellon University)
• Ruslan Salakhutdinov (Carnegie Mellon University)
• John Schulman (OpenAI)
• Mario Ynocente Castro (Preferred Networks)
• Crissman Loomis (Preferred Networks)
• Keisuke Nakata (Preferred Networks)
• Shinya Shiroshita (Preferred Networks)
• Sam Devlin (Microsoft Research)
• Noboru Sean Kuno (Microsoft Research)
• Oriol Vinyals (DeepMind)

• Fei Fang (Carnegie Mellon University)
• Zachary Chase Lipton (Carnegie Mellon University)
• Manuela Veloso (Carnegie Mellon University and JPMorgan Chase)
• Chelsea Finn (Google Brain and Stanford)
• Anca Dragan (UC Berkeley)
• Sergey Levine (UC Berkeley)