# IntroToNetHack

Welcome to NetHack and the NetHack Learning Environment!

# A Brief Intro to NetHack & the NLE¶

Welcome, adventurer! You have been heralded from birth as the instrument of the gods. You are destined to recover the Amulet of Yendor for your deity or die in the attempt. Your hour of destiny has come. For the sake of us all: Go bravely!

This notebook provides a brief overview of the game of NetHack, a glance at the NetHack Learning Environment (NLE) and finally lays down the gauntlet for the NetHack Chellenge!

# What is NetHack?¶

NetHack is a roguelike computer game, which was first introduced in the late 1980s. At the beginning of the game your hero is placed into a dungeon, with the goal to descend to the bottom of over 50 procedurally generated levels to retrieve the Amulet of Yendor. Once obtained, your hero must subsequently escape the dungeon, unlocking five extremely challenging final levels, before offering the Amulet to your in-game deity.

A key component of NetHack is that it is visually simple, with observations solely making use of ascii characters, yet it is complex in almost every other way!

There are several reasons why it is particularly challengng:

1) The game is randomized, with everything from the layout of maps to the impact of actions based on the roll of a dice.

2) Unlike modern games, it is impossible to save, instead when you die you begin from scatch. Given the randomness (see above) this makes it especially "unforgiving" (as described on the wiki). Indeed, deaths are so common there is even an acronym - YASD, which stands for Yet Another Stupid Death.

3) It is incredibly complex, with hundreds of different characters to observe and many more potential sequences of actions.

Thus, unlike other games played by AI agents, NetHack is not solvable by the average human in just a few hours of gameplay. Instead - expert players often take many years to solve it - assuming they are even able to!

NetHack has been actively developed for decades, and NLE makes use of version 3.6.6, originally released in March 2020.

## Playing the Game¶

At the start of the game, players are usually asked to choose their character's starting role, race, gender and religious alignment. From the NetHack Wiki:

The player character can be any one of the following roles: archeologist, barbarian, cave[wo]man, healer, knight, monk, priest[ess], ranger, rogue, samurai, tourist, valkyrie, or wizard. They each have varying difficulties, strengths, weaknesses, quests and starting items.

The player can also choose from the five races: human, elf, dwarf, gnome, or orc, and the three alignments: lawful, neutral or chaotic. The available races and alignments are dependent on the role one picks.

Each different starting combination will alter the game experience, and thus impact the difficulty of the game and the most suitable strategy. For example, wizards start with magic and magical items, while rangers begin with a bow and arrow; elves are generally intelligent whereas dwarves will be strong!

It's worth noting these different starting characters can really affect the performance of agents learning to play the game. In the original NLE paper, agents on the Score task (most similar to the NetHack Challenge) averaged 738 for monk, 538 for valkyrie, 314 for wizard - but only 11 for tourist! For the purposes of the NetHack Challenge, the character is randomized during evaluation for the competition, so it is likely wise to consider agents that can perform well across a variety of hero configurations.

### Complex Observations¶

One of the many challenges of NetHack is the richness of the observation space, with fully-formed dungeon, message line and stats bar all rendered as ascii text! Every character (and color) in the dungeon has a symbolic meaning - whether its a Monster, Item, or just a part of the Dungeon itself.

#### Dungeon¶

The dungeon is the main part of the screen the character navigates. The most frequently seen symbols are:

• @ : You
• . : Dungeon Floor
• < and > : Stairs up and down
• | and - : Walls
• + : Doors

While it is also common to see Fountains: {, Traps: ^, Altars: _ and Hallways: #.

#### Items¶

NetHack has a vast number of items for in-game use, and many objects can be picked up and included in inventory. Once included, the agent can choose to use them in a number of different ways - often with some imaginative consequences: you can apply a towel to a weapon to clean off grease, but you can wear it too (it will wrap around your head)!

Heros will need to use items as best as possible to navigate the dungeons, not least in finding fresh food to eat (unless they can find a different way to stave off hunger)...

#### Monsters!¶

A key component of the difficulty of NetHack (and the cause of many heroic deaths) is the presence of monsters. Throughout the game the hero will encounter many of the hundreds of different types of monsters, ranging from simple jackals which can be trivially defeated to other, more challenging obstacles that typically require significant thought to overcome.

For instance, if you walk into a Floating Eye (blue e) you will become paralyzed and probably die - this is common for even experienced players who lose concentration! To kill one, the hero can: make use of ranged weapons; blind themselves to avoid looking it at it; become invisible so as not to be seen by it; wear a ring of free action (preventing paralysis); or possess a source reflection (thus reflecting the gaze). Got all that?

What makes this a little tricker is that many of the most challenging monsters may be seen infrequently, potentially only being encountered once across multiple games. Thus, while it is possible to memorize a strategy for a handful or even dozens of monsters, it only takes one to slip through the cracks of memory before it is back to the beginning of the game.

#### Taking Actions¶

In order to make the vast array of complex skills possible to achieve, NetHack has a large action space (referred to as commands). The game of NetHack takes inputs directly corresponding to keys on the keyboard, including modifiers such as ctrl, shift and meta. The full list of commands) is extensive, including both actions, and meta-commands such as help, or viewing the inventory.

For the NetHack Challenge we provide an action space that is as close to full set of commands as possible - blocking only a few commands like modifying option settings. This should provide a significant challenge to all AI agents, while also offering them the potential to fully master the game. We note that it may be worthwhile to constrain this with some inductive bias, possibly even considering a curriculum of increasing action space.

#### Structure of the NetHack world¶

The collective name for all levels of the game is the "Mazes of Menace". Your heor starts on the inital Dungeons of Doom, which is above the underworld Gehennom and below the five Planes which form the final stages of the game.

The Dungeons also contain various branches, the locations of which are often randomized. For example, the Gnomish Mines will always be generated between dungeon levels 2-4. There is also a Sokoban branch, located between levels 2-9. In order to reach the Amulet (and win the game), adventurers must complete the Quest, another branch, the location of which varies depending on the role.

This is just a brief foray into the details of the game, for more detail on the Mazes of Menace see the nethackwiki page.

# What is the NetHack Learning Environment (NLE)¶

The NLE is the OpenAI Gym environment which provides researchers with the ability to train agents on the game of NetHack, presented at NeurIPS 2020.

### NetHackChallenge-v0¶

The NLE contains different NetHack based tasks for agent training, but a new environment has been created especially for the competition: 'NetHackChallenge-v0'. The new environment is based on the 'NetHackScore-v0' task used in the NeurIPS paper, but contains some key modifications to bring out the full experience of NetHack. These are:

• The action space of the environment is greatly expanded to allow all keys on the keyboard
• Menus, yes/no questions, cursor-movement, and text-input modalities are enabled.
• A random character (represented as '@' ) instead of a single default (eg 'mon-hum-neu-mal')

This makes the game particularly challenging, while also providing additional opportunity for savvy agents!

NLE is loaded as a gym environment, with all the typical functions that reinforcement learning (RL) researchers will be familiar with. For those using a symbolic approach, this means we typically follow the following few steps:

obs = env.reset() # produces the first observation
done = False # initialize this so we know when episode ends
total_reward = 0 # total reward
while not done:
action = agent.act(obs) # action processes observation and computes an action
obs, reward, done, info = env.step(action) # updates the new observation and provides the reward/done
total_reward += reward # keep track of cumulative reward


When the episode is over (very likely YASD) the total_reward will be the score of the agent, used for training RL agents, and to get an idea of the current performance for symbolic ones.

## Code Examples¶

In [1]:
%%capture
!pip install -U cmake
!apt update -qq && apt install -qq -y flex bison libbz2-dev libglib2.0 libsm6 libxext6
!pip install -U pip
!pip install git+https://github.com/facebookresearch/nle.git@eric/notebook-render  # this can render notebooks

In [2]:
import nle
import gym

In [3]:
env = gym.make("NetHackChallenge-v0", savedir=None)  # (Don't save a recording of the episode)
env.reset()  # each reset generates a new dungeon
env.step(1)  # move agent '@' north
env.render('notebook')

It's a wall.

----.-
-....|
|....|
|?..@|
+...d|
|....|
------

Agent the Hatamoto             St:13 Dx:18 Co:18 In:10 Wi:9 Ch:7 Lawful S:0
Dlvl:1 $:0 HP:15(15) Pw:2(2) AC:4 Xp:1/0  The NLE observation contains multiple objects, many of which we receive as keys in the observation dictionary. Let's take a look. In [4]: obs = env.reset() obs.keys()  Out[4]: dict_keys(['glyphs', 'chars', 'colors', 'specials', 'blstats', 'message', 'inv_glyphs', 'inv_strs', 'inv_letters', 'inv_oclasses', 'tty_chars', 'tty_colors', 'tty_cursor', 'misc']) #### Observing the Dungeon¶ The elements glyphs, chars, colors, and specials are tensors representing the (batched) 2D symbolic observation of the dungeon. Our agents primarily use the first three. • glyphs - are the single integers representing the specific object at a square in the dungeon (eg a pet hell-hound) • chars - are the characters used to render the glyphs on the screen (eg d) • colors - are the colors used to render the glyphs on the screen (eg red) • specials - are any special modifications to render the glyphs on the screen (eg it's a pet!) In [5]: for key in ['glyphs', 'chars', 'colors']: print("\n{}:\n".format(key)) print("Shape: {}\n".format(obs[key].shape)) print(obs[key])  glyphs: Shape: (21, 79) [[2359 2359 2359 ... 2359 2359 2359] [2359 2359 2359 ... 2359 2359 2359] [2359 2359 2359 ... 2359 2359 2359] ... [2359 2359 2359 ... 2359 2359 2359] [2359 2359 2359 ... 2359 2359 2359] [2359 2359 2359 ... 2359 2359 2359]] chars: Shape: (21, 79) [[32 32 32 ... 32 32 32] [32 32 32 ... 32 32 32] [32 32 32 ... 32 32 32] ... [32 32 32 ... 32 32 32] [32 32 32 ... 32 32 32] [32 32 32 ... 32 32 32]] colors: Shape: (21, 79) [[0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0]]  #### BLStats and Message¶ Along the top of the screen is a topline message that the game uses to communicate with you. Paying close attention to what the game can often result in the difference between life and death! The encoding of this message is presented in the observation message Also of interest are the stats along the bottom line of the screen. These are extract in blstats and contain a lot of useful infomation visible below. In [6]: bl_meaning = [ 'hero col', 'hero_row', 'strength_pct', 'strength', 'dexterity', 'constitution', 'intelligence', 'wisdom', 'charisma', 'score', 'hitpoints', 'max_hitpoints', 'depth', 'gold', 'energy', 'max_energy', 'armor_class', 'monster_level', 'experience_level', 'experience_points', 'time', 'hunger_state', 'carrying_capacity', 'dungeon_number', 'level_number' ] env.render('notebook') obs['blstats'] print() print('MESSAGE') print(bytes(obs['message']).decode('ascii').replace('\0','')) print() print('BL STATS') print(' '.join(["%s: %d" % (m,s) for m, s in zip(bl_meaning, obs['blstats'])]))  Aloha Agent, welcome to NetHack! You are a neutral female human Tourist. ----- |@.%| |f..+ +...| |...| ----- Agent the Rambler St:10 Dx:13 Co:11 In:12 Wi:15 Ch:14 Neutral S:0 Dlvl:1$:200 HP:10(10) Pw:2(2) AC:10 Xp:1/0

MESSAGE
Aloha Agent, welcome to NetHack!  You are a neutral female human Tourist.

BL STATS
hero col: 46 hero_row: 7 strength_pct: 10 strength: 10 dexterity: 13 constitution: 11 intelligence: 12 wisdom: 15 charisma: 14 score: 0 hitpoints: 10 max_hitpoints: 10 depth: 1 gold: 200 energy: 2 max_energy: 2 armor_class: 10 monster_level: 0 experience_level: 1 experience_points: 0 time: 1 hunger_state: 1 carrying_capacity: 0 dungeon_number: 0 level_number: 1


#### Inventory¶

After this we have a series of entries to signify what's in the inventory.

• inv_glyphs - The glyphs corresponding to the items in each slot in the inventory
• inv_letters - The letter assigned to the slot in the inventory
• inv_strs - The textual description of each item in the inventory
• inv_oclasses - The object class of the item in the inventory (potion, scroll etc...)
In [7]:
for let, glyph, strs, oclass in zip(
obs['inv_letters'], obs['inv_glyphs'], obs['inv_strs'], obs['inv_oclasses']):

l = chr(let)
desc = bytes(strs).decode('ascii').replace('\0','')
if let:
print('In slot (%s) - glyph: %d, (class %d) - "%s"' % (l, glyph, oclass, desc))



# Next Steps?¶

Included in the starter kit is a Torchbeast implementation of IMPALA, a large scale distributed RL algorithm, adapted for NLE. A similar model was used in the original NLE paper to produce non-trivial learning curves for environments such as NetHackScore-v0.

In the original NLE paper, the agent architecture was as follows:

As can be seen, the model utilized both an agent centric view and a global view, which are both processed with convolutional neural network (CNN) layers. In addition, the blstats are processed with an MLP. Finally, the embeddings are passed into an LSTM to deal with partial observability.

The baseline is almost identical except wit one key difference - we haven added an CNN encoder for the message observation. This architecture may provide a promising starting point for development, but the sky is the limit for new ideas! Check out the README.md to get started!