Update 15th of August: Submission system is still acting up, and we recommend you do not submit full submissions yet. However, your debug runs may work out.
Update 2nd of August: The full BASALT dataset has been released! See slightly more details below in "Dataset" section or download instructions in our baseline solution.
Update 27th of July: We have decided to allow using OpenAI's inverse dynamics model as part of your submission. FAQ section has been updated accordingly.
Office hours: 16th of August, 6pm London time, on the Discord server in the #office-hours voice channel. Then 23rd August, 9am London time.
Clarification on submission limits: As of writing (3rd of August), you have one full submission and three debug submission per day.
The MineRL Benchmark for Agents that Solve Almost-Lifelike Tasks (MineRL BASALT) competition aims to promote research in learning from human feedback to enable agents that can accomplish tasks without crisp, easily-defined reward functions. Our sponsors have generously provided 💰20,000 USD💰 in prize money to support this research, with an additional 100,000 USD for especially surprising results (see "Prizes")!
This is the second iteration of this competition. You can find the page for the BASALT 2021 competition here. Major changes this year include:
- ⚒️ New MineRL simulator version with human-level observation and action-spaces. This change means, for example, that crafting requires opening the inventory UI and using the mouse to craft items.
- 🧠 Pretrained models trained on different Minecraft tasks, which you are free to use in your solutions as you see fit (e.g., fine-tune for a specific task, use for a specific part of behaviour)
- 🏆 Prizes to encourage exploring learning from human-feedback, even if the solution does not reach the top performance.
- 💎 An "intro" track, in which the task is the original, non-restrictive MineRL competition task "Obtain diamond shovel", to ease entry to the competition.
❓ Task and motivation
Real-world tasks are not simply handed to us with a clearly-defined reward function, and it is often challenging to design one --- even if you can verbally describe what you want to be done. To reflect this situation, the BASALT competition environments do not include reward functions. We realize that this workflow is slower and more complicated, but we believe this setup is necessary if we want AI systems to have effective and safe real-world impacts. See the full, original motivation for the BASALT 2021 competition here.
Ponder the following motivating, rhetorical questions 🤔:
- You want to train an AI to construct a nice waterfall in Minecraft using reward. When would you give the agent a positive reward?
- You want a house-building AI that, much like in real-life, would not interfere with (or grief) others while building houses in a Minecraft town. In how many different ways could the AI cause harm to the city (or enjoyment of other players) while building the house? How would you signal these harms with a reward function?
- Speaking of houses, what kind of a metric would you assign to a house to measure its "betterness"? How would you "measure" if one AI is better at building a house than the other?
Now consider this: do you know when somebody has built a waterfall or a house? Can you tell if one house is better than another? If yes, how can we transfer this knowledge to AI? One answer: Learning from human-feedback. Instead of reward functions, we train the agent with demonstrations, preferences ("behaviour A is better than B"), and corrections. See the "Getting Started" section for more material and pointers for this line of work.
To encourage this direction, we define four tasks in human-readable descriptions. You will receive these descriptions to help you design your solutions. The human workers evaluating the videos generated by the submissions also receive these descriptions to aid their evaluations. See the Evaluation section for further details.
This competition will be judged according to a human assessment of the generated trajectories. In particular, for each task, we will generate videos of two different agents acting in the environment and ask a human which agent performed the task better. After collecting many of these comparisons, we will produce a score for each agent using the TrueSkill system, which, very roughly speaking, captures how often your agent is likely to "win" in a head to head comparison. Your final score will be an average, normalized score over all the four tasks (that is, all four tasks have equal weight on your final ranking).
During the competition, competition organizers will quickly rate each submission from 1-5 based on the publicly shown videos of the submissions to give a rough ranking of the solutions. This score will not affect final evaluation of the submissions. Please keep this in mind during the competition; do not assume that your final ranking will match what is on the leaderboard.
Evaluation is done in three steps after submission close. You will get to choose which of your submissions will be used for the final evaluation :
- Phase 1: Maximum of 50 submissions are included in a shorter round of evaluations to determine Top 20 submissions. If there are more than 50 submissions, organizers reserve the right to use any method to limit submissions to 50 (e.g., a faster round of scoring).
- Phase 2: The Top 20 submissions will be evaluated more thoroughly (more evaluations per submission) to determine the ordering of submissions. Top 10 submissions move to validation.
- Validation: Organizers will inspect the source code of Top 10 submissions to ensure compliance with rules. The submissions will also be retrained to ensure no rules were broken during training (mainly: limited compute and training time).'
- If the behaviour of the retrained agent is considerably different we will contact the team and aim to sort out any problems, assuming no rules were broken.
- Winners are chosen: Top submissions that pass validation will be announced as winners. This includes the top performing solutions (e.g. getting good results in the task) and solutions specializing in one of the encouraged methods (see Prizes for details).
The tasks are conceptually same as in the BASALT 2021 competition. Both human evaluators and human demonstrators (who play game to provide the dataset) will be given the same "description". MineRL runs at 20 frames-per-second, meaning that one in-game minute will last 60 * 20 steps = 1,200 steps.
Find Caves task
- Description: Look around for a cave. When you are inside one, press ESCAPE to end the minigame.
- Clarification: You are not allowed to dig down from the surface to find a cave.
- Starting conditions: Spawn in "plains" biome.
- Timelimit: 3 minutes (3,600 steps)
- Description: After spawning in a mountainous area with a water bucket and various tools, build a beautiful waterfall and then reposition yourself to “take a scenic picture” of the same waterfall by pressing the ESCAPE key. Pressing the ESCAPE key also ends the episode.
- Starting conditions: Spawn in "extreme_hills" biome. Start with a waterbucket, cobblestone, a stone pickaxe and a stone shovel.
- Timelimit: 5 minutes (6,000 steps)
Village Animal Pen Task
- Description: After spawning in a village, build an animal pen next to one of the houses in a village. Use your fence posts to build one animal pen that contains at least two of the same animal. (You are only allowed to pen chickens, cows, pigs, sheep or rabbits.) There should be at least one gate that allows players to enter and exit easily. The animal pen should not contain more than one type of animal. (You may kill any extra types of animals that accidentally got into the pen.) Don’t harm the village. Press the ESCAPE key to end the minigame.
- Clarifications: You may need to terraform the area around a house to build a pen. When we say not to harm the village, examples include taking animals from existing pens, damaging existing houses or farms, and attacking villagers. Animal pens must have a single type of animal: pigs, cows, sheep, chicken or rabbit.
- Technical clarification: The MineRL environment may spawn player to a snow biome, which does not contain animals. Organizers will ensure that the seeds used for the evaluation will spawn the player in villages with suitable animals available near the village.
- Starting conditions: Spawn near/in a village. Start with fences, fence gates, carrots, wheat seeds and wheat. This food can be used to attract animals.
- Timelimit: 5 minutes (6,000 steps)
Village House Construction task
- Description: Taking advantage of the items in your inventory, build a new house in the style of the village (random biome), in an appropriate location (e.g. next to the path through the village), without harming the village in the process. Then give a brief tour of the house (i.e. spin around slowly such that all of the walls and the roof are visible). Press the ESCAPE key to end the minigame.
- Clarifications: It’s okay to break items that you misplaced (e.g. use the stone pickaxe to break cobblestone blocks). You are allowed to craft new blocks. You don’t need to copy another house in the village exactly (in fact, we’re more interested in having slight deviations, while keeping the same “style”). You may need to terraform the area to make space for a new house. When we say not to harm the village, examples include taking animals from existing pens, damaging existing houses or farms, and attacking villagers. Please spend less than ten minutes constructing your house.
- Starting conditions: Spawn in/near a village (of any type!). Start with varying construction materials designed to cover different biomes.
- Timelimit: 12 minutes (14,400 steps)
The full BASALT dataset is now available! Big thanks to OpenAI for sponsoring this!
The dataset is 650GB in total, but with the utility script in the baseline repository you can choose how much data you download.
🌠 Intro track
We realize the full task detailed above is daunting, and to ease the entry to this competition, we also have an "intro" track for you to compete in. Your task is to create an agent which can obtain diamond shovel, starting from a random, fresh world. Your submission will be evaluated by running 20 games (18,000 steps maximum) and taking the maximum score over these 20 runs. You agent is rewarded like in the "ObtainDiamond" task in the MineRL 2021 competition, with an additional reward of 2048 points for crafting diamond shovel.
Sounds daunting? This used to be a difficult task, but thanks to OpenAI's VPT models, obtaining diamonds is relatively easy. Building off from this model, your task is to add the part where it uses the diamonds to craft a diamond shovel instead of diamond pickaxe. You can find a baseline solution using the VPT model here. Find the barebone submission template here.
Note that "intro" track is only designed to help you get familiar with the submission system and MineRL; not to actively compete in. Hence we chose "maximum" over episodes rather than "average". There are no winner prizes for the "intro" track, however we may give research prizes to innovative and strong solutions in this track as well.
💪 Getting Started
Start with the following resources:
- Clone the submission template with a random agent and begin to develop your solutions! For intro track, use this submission template.
- Install MineRL v1.0 and explore the BASALT environments (these are the tasks you aim to solve!)
- Check out the behavioural cloning baseline, along with the dummy dataset to help you get started. For intro track, see this baseline solution.
- Explore and study the pretrained models; you are free to use them as part of your submission however you like, and we encourage to do so!
Here are some previous projects that could help you get started!
- Results of BASALT 2021 competition.
- Check out the winners of the MineRL Diamond competition -- while the BASALT tasks are different, there is still much to learn from approaches to Diamond.
- See our list of projects using MineRL here (and please email us to add more to the list!)
- Academic papers related to learning from human-feedback
- Familiarize yourself with the MineRL package and dataset.
- Join the Discord community!
- Participate in research discussions on different approaches to solving the challenge
- Form teams early
Find the official rules here. We will list official changes to rules are in the FAQ of this page. Here is a list of corrections:
- 11th July: You are allowed to use MineDojo datasets as part of your training script, but you will have to download them as part of your training run (MineDojo download links will be whitelisted in the instance). This download will be part of the 4 day training limit. This download will not count towards the 30 MB limit.
There are three categories of prizes:
- 1st place: $7,000 USD
- 2nd place: $4,000 USD
- 3rd place: $3,000 USD
- Blue Sky award: $100,000 USD
- Research prizes: $5,000 USD
- Community support: $1,000 USD
Winners. As described in the Evaluation section, we will evaluate submissions using human feedback to determine how well agents complete each of the four tasks. The three teams that score highest on this evaluation will receive prizes of \$7,000, \$4,000, and \$3,000.
Blue Sky award. This award of $100,000 will be given to submissions that achieve a very high level of performance: human-level performance on at least 3 of the 4 tasks. (Human-level performance is achieved if the human evaluators prefer agent-generated trajectories to human demonstrations at least 50% of the time.) If multiple submissions achieve this milestone, the award will be split equally across all of them.
Research prizes. We have reserved $5,000 of the prize pool to be given out at the organizers’ discretion to submissions that we think made a particularly interesting or valuable research contribution. We might give prizes to:
- Submissions that present novel negative results (e.g. a submission that shows that having humans correct the AIs behavior doesn’t help)
- Submissions that get particularly good results given their approach (e.g. best submission based on behavior cloning, or best submission based on learning from preferences)
- Approaches that create interesting agent behavior beyond “solves the task” (e.g. most human-like agent)
- New, interesting knowledge about learning from human feedback (e.g. an empirically validated scaling law that predicts how much human data is required for a given level of performance, or guidelines on how to decide which types of human feedback to use at any given point in fine-tuning)
If you wish to be considered for a research prize, please include some details on interesting research-relevant results in the README for your submission. We expect to award around 2-10 research prizes in total.
Community support. We will award $1,000 of the prize pool at the organizers’ discretion to people who provide community support, for example by answering other participant’s questions, or creating and sharing useful tools.
June-July: Materials shared: new MineRL, pretrained models and baseline code. 1st 7th22th of July: Competition begins! Participants are invited to start submitting their solutions.
- 28th of October: Submission deadline. Submissions are closed and organizers begin the evaluation process.
- November: Winners are announced and are invited to contribute to the competition writeup.
- 2nd-3rd of December: Presentation at NeurIPS 2022 (online/virtual).
This F.A.Q is the only official place for clarification of competition Rules!
Q: Will you be releasing your setup for collecting demonstrations?
> A: Unfortunately not -- our setup is fairly complex and not fit for public release. However, along with our baseline solutions, we will provide you with a number of tools to help you create your submissions. One of these is a tool for you to record your own Minecraft gameplay in the same environments where the agent plays in.
Q: Will you re-run my training code?
> A: Eventually, but only for the top solutions coming out of Phase 2. We require you to always submit your training code along with your submission. For the evaluations we will use the models you uploaded along your submission. We perform retraining to ensure the training script you provide roughly produces the behaviour of the model you submit.
Q: What does “Minecraft internal state” (that participants aren't allowed to use) refer to?
> A: It refers to hardcoded aspects of world state like “how far am I from a tree” and “what blocks are in a 360 degree radius around me”; things that either would not be available from the agent’s perspective, or that an agent would normally have to infer from data in a real environment, since the real world doesn’t have hardcoded state available.
(11th July) Q: Are you allowed to use MineDojo data?
> A: Yes! You are allowed to download MineDojo dataset(s) during your training run as part of the 4 day training limit, and it will not count towards the 30 MB upload limit. Normally, you are not allowed to download data during the training process, but we have made an exception with MineDojo data. However, you are still not allowed to upload more than 30MB of data as part of your submission even if it is part of MineDojo (you should download it during training).
(27th July) Q: Are you allowed to use OpenAI's inverse dynamics model to predict actions for videos?
> A: Yes! You are allowed to use the OpenAI IDM files shared here. These files will be available to the training instance next to the foundational models.
👉 Similar challenges
If you are interested in AIs which work like humans, communicate with humans and/or are working in Minecraft-like environment, you might be interested in the IGLU contest! They are running again this year.
Thank you to our amazing partners!
Note: Despite the affiliations, this competition is not run by any of the companies/universities (apart from AICrowd), and does not reflect their opinions.
- Anssi Kanervisto (Microsoft Research)
- Stephanie Milani (Carnegie Mellon University)
- Karolis Ramanauskas (Independent)
- Byron V. Galbraith (Seva Inc.)
- Steven H. Wang (ETH Zürich)
- Sander Schulhoff (University of Maryland)
- Brandon Houghton (OpenAI)
- Sharada Mohanty (AIcrowd)
- Rohin Shah (DeepMind)
- Andrew Critch (Encultured.ai)
- Fei Fang (Carnegie Mellon University)
- Kianté Brantley (Cornell University)
- Sam Devlin (Microsoft Research)
- Oriol Vinyals (DeepMind)
If you have any questions, please feel free to contact us on Discord, AICrowd discussion forum or at basalt(at)minerl.io.