Loading
Feedback

NeurIPS 2021: MineRL Diamond Competition

Fully scripted baseline for the Intro track

Meet Bulldozer the lumberjack

By  karolisram


Introduction

This notebook is part two of the Intro track baselines for the MineRL 2021 competition.

Below you will find a fully scripted agent that has two components:

  1. Bulldozer the lumberjack - a script that simply digs forward with occasional jumps and random 90 degree turns.
  2. A script that crafts a wooden pickaxe and digs down to get some cobblestone.

Script #1 runs until a certain number of logs is achieved, then script #2 kicks in. When evaluated on MineRLObtainDiamond environment it achieves an average reward of 4.0.

In part three we will replace the script #1 from above with a machine learning model! Link below:

MineRL BC+scripted

Setup

In [ ]:
%%capture
!sudo add-apt-repository -y ppa:openjdk-r/ppa
!sudo apt-get purge openjdk-*
!sudo apt-get install openjdk-8-jdk
!sudo apt-get install xvfb xserver-xephyr vnc4server python-opengl ffmpeg
In [ ]:
%%capture
!pip3 install --upgrade minerl
!pip3 install pyvirtualdisplay
!pip3 install -U colabgymrender

Import libraries

In [ ]:
import random
import gym
import minerl
from tqdm.notebook import tqdm
from colabgymrender.recorder import Recorder
from pyvirtualdisplay import Display
/usr/local/lib/python3.7/dist-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))

Start of the agent code

In [ ]:
def str_to_act(env, actions):
    """
    Simplifies specifying actions for the scripted part of the agent.
    Some examples for a string with a single action:
        'craft:planks'
        'camera:[10,0]'
        'attack'
        'jump'
        ''
    There should be no spaces in single actions, as we use spaces to separate actions with multiple "buttons" pressed:
        'attack sprint forward'
        'forward camera:[0,10]'

    :param env: base MineRL environment.
    :param actions: string of actions.
    :return: dict action, compatible with the base MineRL environment.
    """
    act = env.action_space.noop()
    for action in actions.split():
        if ":" in action:
            k, v = action.split(':')
            if k == 'camera':
                act[k] = eval(v)
            else:
                act[k] = v
        else:
            act[action] = 1
    return act

Actions

Here's a list of all possible actions:

Dict(attack:Discrete(2),
     back:Discrete(2),
     camera:Box(low=-180.0, high=180.0, shape=(2,)),
     craft:Enum(crafting_table,none,planks,stick,torch),
     equip:Enum(air,iron_axe,iron_pickaxe,none,stone_axe,stone_pickaxe,wooden_axe,wooden_pickaxe),
     forward:Discrete(2),
     jump:Discrete(2),
     left:Discrete(2),
     nearbyCraft:Enum(furnace,iron_axe,iron_pickaxe,none,stone_axe,stone_pickaxe,wooden_axe,wooden_pickaxe),
     nearbySmelt:Enum(coal,iron_ingot,none),
     place:Enum(cobblestone,crafting_table,dirt,furnace,none,stone,torch),
     right:Discrete(2),
     sneak:Discrete(2),
     sprint:Discrete(2))

Camera

Camera actions contain two values:

  1. Pitch (up/down), where up is negative, down is positive.
  2. Yaw (left/right), where left is negative, right is positive.

For example, moving the camera up by 10 degrees would be 'camera:[-10,0]'.

Change agent behaviour here

To change the sequences of actions that the agent performs, change the code inside either the get_action_sequence_bulldozer() or get_action_sequence() function below. One action is done every tick and there are 20 ticks per second in a regular Minecraft game.

In [ ]:
def get_action_sequence_bulldozer():
    """
    Specify the action sequence for Bulldozer, the scripted lumberjack.
    """
    action_sequence_bulldozer = []
    action_sequence_bulldozer += [''] * 100  # wait 5 secs
    action_sequence_bulldozer += ['camera:[10,0]'] * 3  # look down 30 degrees

    for _ in range(100):
        action_sequence_bulldozer += ['attack sprint forward'] * 100  # dig forward for 5 secs
        action_sequence_bulldozer += ['jump']  # jump!
        action_sequence_bulldozer += ['attack sprint forward'] * 100
        action_sequence_bulldozer += ['jump']
        action_sequence_bulldozer += ['attack sprint forward'] * 100
        if random.random() < 0.5:  # turn either 90 degrees left or 90 degrees right with an equal probability
            action_sequence_bulldozer += ['camera:[0,-10]'] * 9
        else:
            action_sequence_bulldozer += ['camera:[0,10]'] * 9
    return action_sequence_bulldozer
In [ ]:
def get_action_sequence():
    """
    Specify the action sequence for the agent to execute.
    """
    # get 6 logs:
    action_sequence = []
    action_sequence += [''] * 100  # wait 5 sec
    action_sequence += ['forward'] * 8
    action_sequence += ['attack'] * 61
    action_sequence += ['camera:[-10,0]'] * 7  # look up
    action_sequence += ['attack'] * 61
    action_sequence += ['attack'] * 61
    action_sequence += ['attack'] * 61
    action_sequence += ['attack'] * 61
    action_sequence += [''] * 50
    action_sequence += ['jump']
    action_sequence += ['forward'] * 10
    action_sequence += ['camera:[-10,0]'] * 2
    action_sequence += ['attack'] * 61
    action_sequence += ['attack'] * 61
    action_sequence += ['attack'] * 61
    action_sequence += ['camera:[10,0]'] * 9  # look down
    action_sequence += [''] * 50

    # make planks, sticks, crafting table and wooden pickaxe:
    action_sequence += ['back'] * 2
    action_sequence += ['craft:planks'] * 4
    action_sequence += ['craft:stick'] * 2
    action_sequence += ['craft:crafting_table']
    action_sequence += ['camera:[10,0]'] * 9
    action_sequence += ['jump']
    action_sequence += [''] * 5
    action_sequence += ['place:crafting_table']
    action_sequence += [''] * 10

    # bug: looking straight down at a crafting table doesn't let you craft. So we look up a bit before crafting:
    action_sequence += ['camera:[-1,0]']
    action_sequence += ['nearbyCraft:wooden_pickaxe']
    action_sequence += ['camera:[1,0]']
    action_sequence += [''] * 10
    action_sequence += ['equip:wooden_pickaxe']
    action_sequence += [''] * 10

    # dig down:
    action_sequence += ['attack'] * 600
    action_sequence += [''] * 10

    return action_sequence

Parameters

In [ ]:
# Parameters:
TEST_EPISODES = 5  # number of episodes to test the agent for.
MAX_TEST_EPISODE_LEN = 5000  # 18k is the default for MineRLObtainDiamond.
N_WOOD_THRESHOLD = 4  # number of wood logs to get before starting script #2.

Start Minecraft

In [ ]:
display = Display(visible=0, size=(400, 300))
display.start();
In [ ]:
env = gym.make('MineRLObtainDiamond-v0')
env = Recorder(env, './video', fps=60)

Run your agent

As the code below runs you should see episode videos and rewards show up. You can run the below cell multiple times to see different episodes.

In [ ]:
for episode in range(TEST_EPISODES):
    obs = env.reset();
    done = False
    total_reward = 0
    steps = 0

    action_sequence_bulldozer = get_action_sequence_bulldozer()
    action_sequence = get_action_sequence()

    # scripted part to get some logs:
    for j, action in enumerate(tqdm(action_sequence_bulldozer[:MAX_TEST_EPISODE_LEN])):
        obs, reward, done, _ = env.step(str_to_act(env, action))
        total_reward += reward
        steps += 1
        if obs['inventory']['log'] >= N_WOOD_THRESHOLD:
            break
        if done:
            break

    # scripted part to use the logs:
    if not done:
        for i, action in enumerate(tqdm(action_sequence[:MAX_TEST_EPISODE_LEN - j])):
            obs, reward, done, _ = env.step(str_to_act(env, action))
            total_reward += reward
            steps += 1
            if done:
                break

    env.release()
    env.play(maxduration=1200)

    print(f'Episode #{episode+1} reward: {total_reward}\t\t episode length: {steps}\n')
100%|██████████| 2783/2783 [00:02<00:00, 1251.72it/s]
Out[ ]: