Round 1: Completed #neurips

MineRL Labs

68.9k

581

504

Update 26th of March: The retrospective report of the competition, including more detailed results, is now available on arxiv! https://arxiv.org/abs/2303.13512

Update 6th of March: The results are in and winners have been announced! Check the results below in the "Winners" section.

Join our Discord! - We are happy to answer your questions and ping you when announcements happen.

Competition homepage! - Learn about the MineRL project and developing AGI in Minecraft.

Tweet us - Ask questions and keep up-to-date on major announcements.

🏁 Competition winners

Main BASALT track winners.

🥇1st place (7000 USD): GoUp
🥈2nd place (4000 USD): UniTeam
🥉3nd place (3000 USD): voggite

None of the submissions matched human performance and thus no submission reached the 100k USD milestone award.

Research prize winners.
These prizes were selected by our advisor team, where each advisor independently chose one submission for which to award the prize given the description of the submission. Each pick by an advisor is worth 1000 USD, with a total of 5000 USD prize pot.

Team UniTeam: 2000 USD. Advisors praised the method for its simplicity yet strong performance. One of the advisors wished more papers compared deep learning solutions to simple ones like this.
Team KABasalt: 2000 USD. Advisors liked the effort put towards using human preferences, which is a relevant topic nowadays.
Team KAIROS: 1000 USD. The advisor liked the intuitiveness and natural combination of RL and IL in this solution.

Community prize winners.
We had many participants who matched these qualifications, but two stood out among the rest:

Discord user Corianas#4212 (500 USD): they were exceptionally active throughout the competition, helping out numerous participants on Discord and sharing interesting results they cooked up during the competition.
Discord user mdda#7746 (500 USD): they created several YouTube videos during the competition which provided instructions on e.g., installing MineRL and getting started with the OpenAI VPT codebase.

Congratulations to all of the prize winners! We will be reaching out to the prize recipients via emails provided in the "final submission selection" form, and community prize winners via Discord directly.

🕵️ Introduction

The MineRL Benchmark for Agents that Solve Almost-Lifelike Tasks (MineRL BASALT) competition aims to promote research in learning from human feedback to enable agents that can accomplish tasks without crisp, easily-defined reward functions. Our sponsors have generously provided 💰20,000 USD💰 in prize money to support this research, with an additional 100,000 USD for especially surprising results (see "Prizes")!

This is the second iteration of this competition. You can find the page for the BASALT 2021 competition here. Major changes this year include:

⚒️ New MineRL simulator version with human-level observation and action-spaces. This change means, for example, that crafting requires opening the inventory UI and using the mouse to craft items.
🧠 Pretrained models trained on different Minecraft tasks, which you are free to use in your solutions as you see fit (e.g., fine-tune for a specific task, use for a specific part of behaviour)
🏆 Prizes to encourage exploring learning from human-feedback, even if the solution does not reach the top performance.
💎 An "intro" track, in which the task is the original, non-restrictive MineRL competition task "Obtain diamond shovel", to ease entry to the competition.

❓ Task and motivation

Real-world tasks are not simply handed to us with a clearly-defined reward function, and it is often challenging to design one --- even if you can verbally describe what you want to be done. To reflect this situation, the BASALT competition environments do not include reward functions. We realize that this workflow is slower and more complicated, but we believe this setup is necessary if we want AI systems to have effective and safe real-world impacts. See the full, original motivation for the BASALT 2021 competition here.

Ponder the following motivating, rhetorical questions 🤔:

You want to train an AI to construct a nice waterfall in Minecraft using reward. When would you give the agent a positive reward?
You want a house-building AI that, much like in real-life, would not interfere with (or grief) others while building houses in a Minecraft town. In how many different ways could the AI cause harm to the city (or enjoyment of other players) while building the house? How would you signal these harms with a reward function?
Speaking of houses, what kind of a metric would you assign to a house to measure its "betterness"? How would you "measure" if one AI is better at building a house than the other?

Now consider this: do you know when somebody has built a waterfall or a house? Can you tell if one house is better than another? If yes, how can we transfer this knowledge to AI? One answer: Learning from human-feedback. Instead of reward functions, we train the agent with demonstrations, preferences ("behaviour A is better than B"), and corrections. See the "Getting Started" section for more material and pointers for this line of work.

To encourage this direction, we define four tasks in human-readable descriptions. You will receive these descriptions to help you design your solutions. The human workers evaluating the videos generated by the submissions also receive these descriptions to aid their evaluations. See the Evaluation section for further details.

🖊 Evaluation

This competition will be judged according to a human assessment of the generated trajectories. In particular, for each task, we will generate videos of two different agents acting in the environment and ask a human which agent performed the task better. After collecting many of these comparisons, we will produce a score for each agent using the TrueSkill system, which, very roughly speaking, captures how often your agent is likely to "win" in a head to head comparison. Your final score will be an average, normalized score over all the four tasks (that is, all four tasks have equal weight on your final ranking).

During the competition, competition organizers will quickly rate each submission from 1-5 based on the publicly shown videos of the submissions to give a rough ranking of the solutions. This score will not affect final evaluation of the submissions. Please keep this in mind during the competition; do not assume that your final ranking will match what is on the leaderboard.

Evaluation is done in three steps after submission close. You will get to choose which of your submissions will be used for the final evaluation :

Phase 1: Maximum of 50 submissions are included in a shorter round of evaluations to determine Top 20 submissions. If there are more than 50 submissions, organizers reserve the right to use any method to limit submissions to 50 (e.g., a faster round of scoring).
Phase 2: The Top 20 submissions will be evaluated more thoroughly (more evaluations per submission) to determine the ordering of submissions. Top 10 submissions move to validation.
Validation: Organizers will inspect the source code of Top 10 submissions to ensure compliance with rules. The submissions will also be retrained to ensure no rules were broken during training (mainly: limited compute and training time).'
- If the behaviour of the retrained agent is considerably different we will contact the team and aim to sort out any problems, assuming no rules were broken.
Winners are chosen: Top submissions that pass validation will be announced as winners. This includes the top performing solutions (e.g. getting good results in the task) and solutions specializing in one of the encouraged methods (see Prizes for details).

📝 Tasks

The tasks are conceptually same as in the BASALT 2021 competition. Both human evaluators and human demonstrators (who play game to provide the dataset) will be given the same "description". MineRL runs at 20 frames-per-second, meaning that one in-game minute will last 60 * 20 steps = 1,200 steps.

Find Caves task

Description: Look around for a cave. When you are inside one, press ESCAPE to end the minigame.
- Clarification: You are not allowed to dig down from the surface to find a cave.
Starting conditions: Spawn in "plains" biome.
Timelimit: 3 minutes (3,600 steps)

Waterfall task

Description: After spawning in a mountainous area with a water bucket and various tools, build a beautiful waterfall and then reposition yourself to “take a scenic picture” of the same waterfall by pressing the ESCAPE key. Pressing the ESCAPE key also ends the episode.
Starting conditions: Spawn in "extreme_hills" biome. Start with a waterbucket, cobblestone, a stone pickaxe and a stone shovel.
Timelimit: 5 minutes (6,000 steps)

Village Animal Pen Task

Description: After spawning in a village, build an animal pen next to one of the houses in a village. Use your fence posts to build one animal pen that contains at least two of the same animal. (You are only allowed to pen chickens, cows, pigs, sheep or rabbits.) There should be at least one gate that allows players to enter and exit easily. The animal pen should not contain more than one type of animal. (You may kill any extra types of animals that accidentally got into the pen.) Don’t harm the village. Press the ESCAPE key to end the minigame.
- Clarifications: You may need to terraform the area around a house to build a pen. When we say not to harm the village, examples include taking animals from existing pens, damaging existing houses or farms, and attacking villagers. Animal pens must have a single type of animal: pigs, cows, sheep, chicken or rabbit.
- Technical clarification: The MineRL environment may spawn player to a snow biome, which does not contain animals. Organizers will ensure that the seeds used for the evaluation will spawn the player in villages with suitable animals available near the village.
Starting conditions: Spawn near/in a village. Start with fences, fence gates, carrots, wheat seeds and wheat. This food can be used to attract animals.
Timelimit: 5 minutes (6,000 steps)

Village House Construction task

Description: Taking advantage of the items in your inventory, build a new house in the style of the village (random biome), in an appropriate location (e.g. next to the path through the village), without harming the village in the process. Then give a brief tour of the house (i.e. spin around slowly such that all of the walls and the roof are visible). Press the ESCAPE key to end the minigame.
- Clarifications: It’s okay to break items that you misplaced (e.g. use the stone pickaxe to break cobblestone blocks). You are allowed to craft new blocks. You don’t need to copy another house in the village exactly (in fact, we’re more interested in having slight deviations, while keeping the same “style”). You may need to terraform the area to make space for a new house. When we say not to harm the village, examples include taking animals from existing pens, damaging existing houses or farms, and attacking villagers. Please spend less than ten minutes constructing your house.
Starting conditions: Spawn in/near a village (of any type!). Start with varying construction materials designed to cover different biomes.
Timelimit: 12 minutes (14,400 steps)

📊 Dataset

The full BASALT dataset is now available! Big thanks to OpenAI for sponsoring this!

You can find the data index files in the OpenAI VPT repository, and further helper scripts in our baseline repository here.

The dataset is 650GB in total, but with the utility script in the baseline repository you can choose how much data you download.

🌠 Intro track

We realize the full task detailed above is daunting, and to ease the entry to this competition, we also have an "intro" track for you to compete in. Your task is to create an agent which can obtain diamond shovel, starting from a random, fresh world. Your submission will be evaluated by running 20 games (18,000 steps maximum) and taking the maximum score over these 20 runs. You agent is rewarded like in the "ObtainDiamond" task in the MineRL 2021 competition, with an additional reward of 2048 points for crafting diamond shovel.

Sounds daunting? This used to be a difficult task, but thanks to OpenAI's VPT models, obtaining diamonds is relatively easy. Building off from this model, your task is to add the part where it uses the diamonds to craft a diamond shovel instead of diamond pickaxe. You can find a baseline solution using the VPT model here. Find the barebone submission template here.

Note that "intro" track is only designed to help you get familiar with the submission system and MineRL; not to actively compete in. Hence we chose "maximum" over episodes rather than "average". There are no winner prizes for the "intro" track, however we may give research prizes to innovative and strong solutions in this track as well.

💪 Getting Started

Start with the following resources:

Clone the submission template with a random agent and begin to develop your solutions! For intro track, use this submission template.
Install MineRL v1.0 and explore the BASALT environments (these are the tasks you aim to solve!)
Check out the behavioural cloning baseline, along with the dummy dataset to help you get started. For intro track, see this baseline solution.
Explore and study the pretrained models; you are free to use them as part of your submission however you like, and we encourage to do so!

Here are some previous projects that could help you get started!

Results of BASALT 2021 competition.
Check out the winners of the MineRL Diamond competition -- while the BASALT tasks are different, there is still much to learn from approaches to Diamond.
- 2019, 2020 and 2021 submissions
- Top teams talks from 2019
See our list of projects using MineRL here (and please email us to add more to the list!)
Academic papers related to learning from human-feedback
Familiarize yourself with the MineRL package and dataset.
- Read the docs
- Download and explore the older data! While this is not used for this competition, it will give you a sense of what you will be working with.
Join the Discord community!
- Participate in research discussions on different approaches to solving the challenge
- Form teams early

📜 Rules

Find the official rules here. We will list official changes to rules are in the FAQ of this page.

💵 Prizes

Promising solutions (at organizers' discretion) will be rewarded with a virtual NeurIPS 2022 ticket after submissions close, with the condition that the recipents will present their solution at the competition workshop.

There are three categories of prizes:

Winners
1. 1st place: $7,000 USD
2. 2nd place: $4,000 USD
3. 3rd place: $3,000 USD
Blue Sky award: $100,000 USD
Research prizes: $5,000 USD
Community support: $1,000 USD

Winners. As described in the Evaluation section, we will evaluate submissions using human feedback to determine how well agents complete each of the four tasks. The three teams that score highest on this evaluation will receive prizes of $7,000, $4,000, and $3,000.

Blue Sky award. This award of $100,000 will be given to submissions that achieve a very high level of performance: human-level performance on at least 3 of the 4 tasks. (Human-level performance is achieved if the human evaluators prefer agent-generated trajectories to human demonstrations at least 50% of the time.) If multiple submissions achieve this milestone, the award will be split equally across all of them.

Research prizes. We have reserved $5,000 of the prize pool to be given out at the organizers’ discretion to submissions that we think made a particularly interesting or valuable research contribution. We might give prizes to:

Submissions that present novel negative results (e.g. a submission that shows that having humans correct the AIs behavior doesn’t help)
Submissions that get particularly good results given their approach (e.g. best submission based on behavior cloning, or best submission based on learning from preferences)
Approaches that create interesting agent behavior beyond “solves the task” (e.g. most human-like agent)
New, interesting knowledge about learning from human feedback (e.g. an empirically validated scaling law that predicts how much human data is required for a given level of performance, or guidelines on how to decide which types of human feedback to use at any given point in fine-tuning)

If you wish to be considered for a research prize, please include some details on interesting research-relevant results in the README for your submission. We expect to award around 2-10 research prizes in total.

Community support. We will award $1,000 of the prize pool at the organizers’ discretion to people who provide community support, for example by answering other participant’s questions, or creating and sharing useful tools.

📅 Timeline

June-July: Materials shared: new MineRL, pretrained models and baseline code.
~~1st~~ ~~7th~~ 22th of July: Competition begins! Participants are invited to start submitting their solutions.
~~28th of October:~~ Submission deadline. Submissions are closed and organizers begin the evaluation process.
November: Winners are announced and are invited to contribute to the competition writeup.
6th of December, 11:00-14:00 UTC: Presentation at NeurIPS 2022 (online/virtual). Event available here.

🙋 F.A.Q

This F.A.Q is the only official place for clarification of competition Rules!

Q: Will you be releasing your setup for collecting demonstrations?

> A: Unfortunately not -- our setup is fairly complex and not fit for public release. However, along with our baseline solutions, we will provide you with a number of tools to help you create your submissions. One of these is a tool for you to record your own Minecraft gameplay in the same environments where the agent plays in.

Q: Will you re-run my training code?

> A: Eventually, but only for the top solutions coming out of Phase 2. We require you to always submit your training code along with your submission. For the evaluations we will use the models you uploaded along your submission. We perform retraining to ensure the training script you provide roughly produces the behaviour of the model you submit.

Q: What does “Minecraft internal state” (that participants aren't allowed to use) refer to?

> A: It refers to hardcoded aspects of world state like “how far am I from a tree” and “what blocks are in a 360 degree radius around me”; things that either would not be available from the agent’s perspective, or that an agent would normally have to infer from data in a real environment, since the real world doesn’t have hardcoded state available.

(11th July) Q: Are you allowed to use MineDojo data?

> A: Yes! You are allowed to download MineDojo dataset(s) during your training run as part of the 4 day training limit, and it will not count towards the 30 MB upload limit. Normally, you are not allowed to download data during the training process, but we have made an exception with MineDojo data. However, you are still not allowed to upload more than 30MB of data as part of your submission even if it is part of MineDojo (you should download it during training).

(27th July) Q: Are you allowed to use OpenAI's inverse dynamics model to predict actions for videos?

> A: Yes! You are allowed to use the OpenAI IDM files shared here. These files will be available to the training instance next to the foundational models.

(28th August) Q: What are the hardware specifications of the machine that is used for running and training the submissions?

> A: While this is not set in stone, we are currently using Azure NC6 instances (6 vCPUs, 56GB of RAM, one K80 GPU with 12GB VRAM) for running the submissions for leaderboard results. We will also aim to use the same instances for training the models.

Have more questions? Ask in Discord or on the Forum!

👉 Similar challenges

If you are interested in AIs which work like humans, communicate with humans and/or are working in Minecraft-like environment, you might be interested in the IGLU contest! They are running again this year.

🤝 Partners

Thank you to our amazing partners!

FTX Future Fund (via the Regranting Program)

Encultured AI

Microsoft

👥 Team

Note: Despite the affiliations, this competition is not run by any of the companies/universities (apart from AICrowd), and does not reflect their opinions.