Loading
Feedback

IITM RL Final Project

BSuite Challenge Starter Kit

IITM RL Final Project Bsuite starter kit with random baseline

By  dipam


IITM RL FINAL PROJECT

Problem - bsuite

This notebook uses an open source reinforcement learning benchmark known as bsuite. https://github.com/deepmind/bsuite

bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning agent.

Your task is to use any reinforcement learning techniques at your disposal to get high scores on the environments specified.

Note: Since the course is on Reinforcement Learning, please limit yourself to using traditional Reinforcement Learning algorithms,

Do not use deep reinforcement learning.

How to use this notebook? 📝

  • This is a shared template and any edits you make here will not be saved. You should make a copy in your own drive. Click the "File" menu (top-left), then "Save a Copy in Drive". You will be working in your copy however you like.

notebook overview

  • Update the config parameters. You can define the common variables here
Variable Description
AICROWD_RESULTS_DIR Path to write the output to.
AICROWD_ASSETS_DIR In case your notebook needs additional files (like model weights, etc.,), you can add them to a directory and specify the path to the directory here (please specify relative path). The contents of this directory will be sent to AIcrowd for evaluation.
AICROWD_API_KEY In order to submit your code to AIcrowd, you need to provide your account's API key. This key is available at https://www.aicrowd.com/participants/me
In [1]:
!pip install -q aicrowd-cli
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 51kB 2.8MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 61kB 4.2MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 61kB 5.3MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 81kB 6.9MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 163kB 24.6MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 204kB 44.3MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 71kB 6.1MB/s 
     |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 51kB 4.6MB/s 
ERROR: google-colab 1.0.0 has requirement requests~=2.23.0, but you'll have requests 2.25.1 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.

AIcrowd Runtime Configuration 🧷

Get login API key from https://www.aicrowd.com/participants/me

In [22]:
import os

AICROWD_RESULTS_DIR = os.getenv("OUTPUTS_DIR", "results")
os.environ["RESULTS_DIR"] = AICROWD_RESULTS_DIR
API_KEY = ""
In [23]:
!aicrowd login --api-key $API_KEY
API Key valid
Saved API Key successfully!

Install packages 🗃

Please add all pacakage installations in this section

In [4]:
!pip install git+http://gitlab.aicrowd.com/nimishsantosh107/bsuite.git
!pip install tabulate
!pip install tqdm

## Add any other installations you need here
Collecting git+http://gitlab.aicrowd.com/nimishsantosh107/bsuite.git
  Cloning http://gitlab.aicrowd.com/nimishsantosh107/bsuite.git to /tmp/pip-req-build-cqnt6u81
  Running command git clone -q http://gitlab.aicrowd.com/nimishsantosh107/bsuite.git /tmp/pip-req-build-cqnt6u81
Requirement already satisfied: absl-py in /usr/local/lib/python3.7/dist-packages (from bsuite==0.3.5) (0.12.0)
Collecting dm_env
  Downloading https://files.pythonhosted.org/packages/fa/84/c96b6544b8a2cfefc663b7dbd7fc0c2f2c3b6cbf68b0171775693bda2a66/dm_env-1.4-py3-none-any.whl
Collecting frozendict
  Downloading https://files.pythonhosted.org/packages/4e/55/a12ded2c426a4d2bee73f88304c9c08ebbdbadb82569ebdd6a0c007cfd08/frozendict-1.2.tar.gz
Requirement already satisfied: gym in /usr/local/lib/python3.7/dist-packages (from bsuite==0.3.5) (0.17.3)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from bsuite==0.3.5) (3.2.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from bsuite==0.3.5) (1.19.5)
Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from bsuite==0.3.5) (1.1.5)
Requirement already satisfied: plotnine in /usr/local/lib/python3.7/dist-packages (from bsuite==0.3.5) (0.6.0)
Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from bsuite==0.3.5) (1.4.1)
Requirement already satisfied: scikit-image in /usr/local/lib/python3.7/dist-packages (from bsuite==0.3.5) (0.16.2)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from bsuite==0.3.5) (1.15.0)
Requirement already satisfied: termcolor in /usr/local/lib/python3.7/dist-packages (from bsuite==0.3.5) (1.1.0)
Requirement already satisfied: dm-tree in /usr/local/lib/python3.7/dist-packages (from dm_env->bsuite==0.3.5) (0.1.6)
Requirement already satisfied: pyglet<=1.5.0,>=1.4.0 in /usr/local/lib/python3.7/dist-packages (from gym->bsuite==0.3.5) (1.5.0)
Requirement already satisfied: cloudpickle<1.7.0,>=1.2.0 in /usr/local/lib/python3.7/dist-packages (from gym->bsuite==0.3.5) (1.3.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->bsuite==0.3.5) (2.4.7)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->bsuite==0.3.5) (2.8.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->bsuite==0.3.5) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->bsuite==0.3.5) (0.10.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->bsuite==0.3.5) (2018.9)
Requirement already satisfied: mizani>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from plotnine->bsuite==0.3.5) (0.6.0)
Requirement already satisfied: patsy>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from plotnine->bsuite==0.3.5) (0.5.1)
Requirement already satisfied: statsmodels>=0.9.0 in /usr/local/lib/python3.7/dist-packages (from plotnine->bsuite==0.3.5) (0.10.2)
Requirement already satisfied: descartes>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from plotnine->bsuite==0.3.5) (1.1.0)
Requirement already satisfied: PyWavelets>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image->bsuite==0.3.5) (1.1.1)
Requirement already satisfied: imageio>=2.3.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image->bsuite==0.3.5) (2.4.1)
Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image->bsuite==0.3.5) (2.5.1)
Requirement already satisfied: pillow>=4.3.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image->bsuite==0.3.5) (7.1.2)
Requirement already satisfied: future in /usr/local/lib/python3.7/dist-packages (from pyglet<=1.5.0,>=1.4.0->gym->bsuite==0.3.5) (0.16.0)
Requirement already satisfied: palettable in /usr/local/lib/python3.7/dist-packages (from mizani>=0.6.0->plotnine->bsuite==0.3.5) (3.3.0)
Requirement already satisfied: decorator<5,>=4.3 in /usr/local/lib/python3.7/dist-packages (from networkx>=2.0->scikit-image->bsuite==0.3.5) (4.4.2)
Building wheels for collected packages: bsuite, frozendict
  Building wheel for bsuite (setup.py) ... done
  Created wheel for bsuite: filename=bsuite-0.3.5-cp37-none-any.whl size=252043 sha256=b5fa80e2ffb1722276511fc14d6f0b7c3c7c0ca0ad69ac65828e81d475f6091b
  Stored in directory: /tmp/pip-ephem-wheel-cache-vla93t6k/wheels/61/ea/06/77c82c07765fb8608e50e6c66bc566fa6d113c725bc6937e7b
  Building wheel for frozendict (setup.py) ... done
  Created wheel for frozendict: filename=frozendict-1.2-cp37-none-any.whl size=3150 sha256=f42fca0f2acb56a227e56562a0ed8fd18acb4b3163402c9003a336c9eead015d
  Stored in directory: /root/.cache/pip/wheels/6c/6c/e9/534386165bd12cf1885582c75eb6d0ffcb321b65c23fe0f834
Successfully built bsuite frozendict
Installing collected packages: dm-env, frozendict, bsuite
Successfully installed bsuite-0.3.5 dm-env-1.4 frozendict-1.2
Requirement already satisfied: tabulate in /usr/local/lib/python3.7/dist-packages (0.8.9)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (4.60.0)

Import packages

In [5]:
import gym
import warnings

import numpy as np
import pandas as pd
import plotnine as gg
from tqdm.notebook import tqdm

import bsuite
from bsuite.aicrowd import environments
from bsuite.aicrowd.runner import Runner
from bsuite.aicrowd.analysis import Analyzer

pd.options.mode.chained_assignment = None
gg.theme_set(gg.theme_bw(base_size=16, base_family='serif'))
gg.theme_update(figure_size=(3, 1), panel_spacing_x=0.5, panel_spacing_y=0.5)
warnings.filterwarnings('ignore')

Agent Class

You can modify the AGENT TEMPLATE below and implement the logic of your agent. Your agent must implement a few methods that will be called by the Runner class.

  • __init__ - put any initialization code here.
  • get_action - takes in a state and returns an action.
  • learn - takes in (state, action, reward, next_state), implements the learning logic.
  • get_state - takes in a raw observation directly from the env, discretizes it and returns a state.

In addition to these, you may implement other methods which can be called by the above methods.

Since there are multiple environments, you may need unique hyper parameters for each environment. Instantiate the agent while passing in the hyper parameters in a dictionary using the agent_config parameter so that each environment can use different hyper parameters for the agent while using a single Agent class for all of them. You can use any names for the keys in the config dictionary.

An example RandomAgent is given below.

In [6]:
# *** YOU CAN EDIT THIS CELL ***
# AGENT TEMPLATE
class Agent:
    def __init__(self, agent_config=None):
        self.config = agent_config
        pass

    def get_action(self, state):
        '''
        PARAMETERS  : 
            - state - discretized 'state'
        RETURNS     : 
            - action - 'action' to be taken
        '''
        raise NotImplementedError
        return action
    
    def learn(self, state, action, reward, next_state):
        '''
        PARAMETERS  : 
            - state - discretized 'state'
            - action - 'action' performed in 'state'
            - reward - 'reward' received due to action taken
            - next_state - discretized 'next_state'
        RETURNS     : 
            - NIL
        '''
        raise NotImplementedError

    def get_state(self, observation):
        '''
        PARAMETERS  : 
            - observation - raw 'observation' from environment
        RETURNS     : 
            - state - discretized 'state' from raw 'observation'
        '''
        raise NotImplementedError
        return state
In [7]:
# *** YOU CAN EDIT THIS CELL ***
# DO NOT rename the config dictionaries as the evaluator references them. However, you may use any names for the keys in them.
catch_config = {}
catch_noise_config = {}
cartpole_config = {}
cartpole_noise_config = {}
mountaincar_config = {}
mountaincar_noise_config = {}
In [8]:
# *** YOU CAN EDIT THIS CELL ***
# EXAMPLE
class RandomAgent:
    def __init__(self, agent_config={}):
        self.config = agent_config

    def get_action(self, state):
        action = np.random.choice(2)
        return action
    
    def learn(self, state, action, reward, next_state):
        if ('BAR' in self.config):
            if (self.config['BAR']):
                self.config['FOO'] += 1

    def get_state(self, observation):
        state = observation
        return state

env1_config = {
    'FOO': 0.1,
    'BAR': True
}

env2_config = {
    'FOO': 0.2,
    'BAR': False
}

randomAgent1 = RandomAgent(agent_config=env1_config)
randomAgent2 = RandomAgent(agent_config=env2_config)

Playing with the Environment

Instantiating the environment :

You can create an environment by calling the following function:
environments.load_env(ENV_ID) - RETURNS: env
where, ENV_ID can be ONE of the following:

  • environments.CATCH
  • environments.CATCH_NOISE
  • environments.CARTPOLE
  • environments.CARTPOLE_NOISE
  • environments.MOUNTAINCAR
  • environments.MOUNTAINCAR_NOISE

The NOISE environments add a scaled random noise to the reward.

Runnning the environment :

There are certain methods required to run the environments. The interface is very similar to OpenAI Gym's interfaces. Fore more information, read the OpenAI documentation here.

env.reset() - RETURNS: observation
env.step(action) - RETURNS: (next_observation, reward, done, info[NOT USED])

There are also a few useful properties within the environments:

  • env.action_space.n - total number of possible actions. eg: if 'n' is 3, then the possible actions are [0, 1, 2]
  • env.observation_space.shape - the shape of the observation.
  • env.bsuite_num_episodes - the pre-specified number of episodes which will be run during evaluation (unique for each environment).
ONLY IN CATCH / CATCH_NOISE
  • env.observation_space.high - the upper limit for every index in the observation.
  • env.observation_space.low - the lower limit for every index of the observation.

Environment Observation Space Limits:

The limits for the observation space (minimum and maximum) for all the environments are given in the table below:

Environments Limits
CATCH
CATCH_NOISE
MIN: use env.observation_space.low
MAX: use env.observation_space.high
CARTPOLE
CARTPOLE_NOISE
MIN: [-1. -5., -1., -1., -5., 0.]
MAX: [ 1., 5., 1., 1., 5., 1.]
MOUNTAINCAR
MOUNTAINCAR_NOISE
MIN: [-1.2, -0.07, 0.]
MAX: [ 0.6, 0.07, 1.]

[NOTE] Use this code cell to play around and get used to the environments. However, the Runner class below will be used to evaluate your agent.

In [9]:
# *** YOU CAN EDIT THIS CELL ***
# TEST AREA
env = environments.load_env(environments.CARTPOLE)  # replace 'environments.CARTPOLE' with other environments

agent = RandomAgent(agent_config={})                # replace with 'Agent()' to use your custom agent

NUM_EPISODES = 10                                   # replace with 'env.bsuite_num_episodes' to run for pre-specified number of episodes
for episode_n in tqdm(range(NUM_EPISODES)):
    done = False
    episode_reward = 0
    episode_moves = 0
 
    observation = env.reset()
    state = agent.get_state(observation)

    while not done:
        action = agent.get_action(state)

        next_observation, reward, done, _ = env.step(action)
        next_state = agent.get_state(next_observation)

        agent.learn(state, action, reward, next_state)

        state = next_state

        episode_reward += reward
        episode_moves += 1

    if (((episode_n+1) % 2) == 0): 
        print("EPISODE: ",episode_n+1,"\tREWARD: ",episode_reward,"\tEPISODE_LENGTH: ",episode_moves)
Loaded bsuite_id: cartpole/0.
EPISODE:  2 	REWARD:  37.0 	EPISODE_LENGTH:  38
EPISODE:  4 	REWARD:  38.0 	EPISODE_LENGTH:  39
EPISODE:  6 	REWARD:  39.0 	EPISODE_LENGTH:  40
EPISODE:  8 	REWARD:  43.0 	EPISODE_LENGTH:  44
EPISODE:  10 	REWARD:  36.0 	EPISODE_LENGTH:  37

Point to the Agent Class you'll use for the final score

In [10]:
RLAgent = RandomAgent

Evaluating the Agent on all the Environments

  • The following cells will take care of running your agent on each environment and aggregating the results in csv files. In each of the following cells, the agent_config parameter is already set to use the corresponding config dictionary for that environment. DO NOT EDIT THIS.
  • Feel free to modify the LOG_INTERVAL parameter to change the interval between episodes for logging.
  • Please do not modify any other contents in each of the cells.
In [11]:
LOG_INTERVAL = 100
In [12]:
runner = Runner(
    agent = RLAgent(agent_config=catch_config),
    env_id = environments.CATCH,
    log_interval = LOG_INTERVAL,
)
runner.play_episodes()
Loaded bsuite_id: catch/0.
Logging results to CSV file for each bsuite_id in results.
EPISODE:  100 	REWARD:  -1.0 	MEAN_REWARD:  -0.54 	EPISODE_LENGTH:  9
EPISODE:  200 	REWARD:  -1.0 	MEAN_REWARD:  -0.44 	EPISODE_LENGTH:  9
EPISODE:  300 	REWARD:  -1.0 	MEAN_REWARD:  -0.58 	EPISODE_LENGTH:  9
EPISODE:  400 	REWARD:  -1.0 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  500 	REWARD:  -1.0 	MEAN_REWARD:  -0.68 	EPISODE_LENGTH:  9
EPISODE:  600 	REWARD:  -1.0 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  700 	REWARD:  -1.0 	MEAN_REWARD:  -0.74 	EPISODE_LENGTH:  9
EPISODE:  800 	REWARD:  -1.0 	MEAN_REWARD:  -0.44 	EPISODE_LENGTH:  9
EPISODE:  900 	REWARD:  -1.0 	MEAN_REWARD:  -0.68 	EPISODE_LENGTH:  9
EPISODE:  1000 	REWARD:  -1.0 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  1100 	REWARD:  -1.0 	MEAN_REWARD:  -0.58 	EPISODE_LENGTH:  9
EPISODE:  1200 	REWARD:  -1.0 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  1300 	REWARD:  -1.0 	MEAN_REWARD:  -0.54 	EPISODE_LENGTH:  9
EPISODE:  1400 	REWARD:  1.0 	MEAN_REWARD:  -0.44 	EPISODE_LENGTH:  9
EPISODE:  1500 	REWARD:  -1.0 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  1600 	REWARD:  -1.0 	MEAN_REWARD:  -0.62 	EPISODE_LENGTH:  9
EPISODE:  1700 	REWARD:  1.0 	MEAN_REWARD:  -0.7 	EPISODE_LENGTH:  9
EPISODE:  1800 	REWARD:  1.0 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  1900 	REWARD:  -1.0 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  2000 	REWARD:  -1.0 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  2100 	REWARD:  -1.0 	MEAN_REWARD:  -0.54 	EPISODE_LENGTH:  9
EPISODE:  2200 	REWARD:  -1.0 	MEAN_REWARD:  -0.68 	EPISODE_LENGTH:  9
EPISODE:  2300 	REWARD:  -1.0 	MEAN_REWARD:  -0.7 	EPISODE_LENGTH:  9
EPISODE:  2400 	REWARD:  -1.0 	MEAN_REWARD:  -0.68 	EPISODE_LENGTH:  9
EPISODE:  2500 	REWARD:  -1.0 	MEAN_REWARD:  -0.56 	EPISODE_LENGTH:  9
EPISODE:  2600 	REWARD:  1.0 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  2700 	REWARD:  -1.0 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  2800 	REWARD:  -1.0 	MEAN_REWARD:  -0.64 	EPISODE_LENGTH:  9
EPISODE:  2900 	REWARD:  -1.0 	MEAN_REWARD:  -0.48 	EPISODE_LENGTH:  9
EPISODE:  3000 	REWARD:  -1.0 	MEAN_REWARD:  -0.7 	EPISODE_LENGTH:  9
EPISODE:  3100 	REWARD:  -1.0 	MEAN_REWARD:  -0.68 	EPISODE_LENGTH:  9
EPISODE:  3200 	REWARD:  -1.0 	MEAN_REWARD:  -0.54 	EPISODE_LENGTH:  9
EPISODE:  3300 	REWARD:  -1.0 	MEAN_REWARD:  -0.46 	EPISODE_LENGTH:  9
EPISODE:  3400 	REWARD:  -1.0 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  3500 	REWARD:  -1.0 	MEAN_REWARD:  -0.54 	EPISODE_LENGTH:  9
EPISODE:  3600 	REWARD:  -1.0 	MEAN_REWARD:  -0.54 	EPISODE_LENGTH:  9
EPISODE:  3700 	REWARD:  -1.0 	MEAN_REWARD:  -0.5 	EPISODE_LENGTH:  9
EPISODE:  3800 	REWARD:  -1.0 	MEAN_REWARD:  -0.64 	EPISODE_LENGTH:  9
EPISODE:  3900 	REWARD:  -1.0 	MEAN_REWARD:  -0.5 	EPISODE_LENGTH:  9
EPISODE:  4000 	REWARD:  -1.0 	MEAN_REWARD:  -0.44 	EPISODE_LENGTH:  9
EPISODE:  4100 	REWARD:  -1.0 	MEAN_REWARD:  -0.48 	EPISODE_LENGTH:  9
EPISODE:  4200 	REWARD:  -1.0 	MEAN_REWARD:  -0.58 	EPISODE_LENGTH:  9
EPISODE:  4300 	REWARD:  -1.0 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  4400 	REWARD:  -1.0 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  4500 	REWARD:  -1.0 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  4600 	REWARD:  -1.0 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  4700 	REWARD:  1.0 	MEAN_REWARD:  -0.56 	EPISODE_LENGTH:  9
EPISODE:  4800 	REWARD:  -1.0 	MEAN_REWARD:  -0.54 	EPISODE_LENGTH:  9
EPISODE:  4900 	REWARD:  -1.0 	MEAN_REWARD:  -0.74 	EPISODE_LENGTH:  9
EPISODE:  5000 	REWARD:  -1.0 	MEAN_REWARD:  -0.52 	EPISODE_LENGTH:  9
EPISODE:  5100 	REWARD:  -1.0 	MEAN_REWARD:  -0.76 	EPISODE_LENGTH:  9
EPISODE:  5200 	REWARD:  -1.0 	MEAN_REWARD:  -0.44 	EPISODE_LENGTH:  9
EPISODE:  5300 	REWARD:  -1.0 	MEAN_REWARD:  -0.58 	EPISODE_LENGTH:  9
EPISODE:  5400 	REWARD:  -1.0 	MEAN_REWARD:  -0.64 	EPISODE_LENGTH:  9
EPISODE:  5500 	REWARD:  -1.0 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  5600 	REWARD:  -1.0 	MEAN_REWARD:  -0.68 	EPISODE_LENGTH:  9
EPISODE:  5700 	REWARD:  1.0 	MEAN_REWARD:  -0.72 	EPISODE_LENGTH:  9
EPISODE:  5800 	REWARD:  -1.0 	MEAN_REWARD:  -0.58 	EPISODE_LENGTH:  9
EPISODE:  5900 	REWARD:  -1.0 	MEAN_REWARD:  -0.62 	EPISODE_LENGTH:  9
EPISODE:  6000 	REWARD:  -1.0 	MEAN_REWARD:  -0.48 	EPISODE_LENGTH:  9
EPISODE:  6100 	REWARD:  -1.0 	MEAN_REWARD:  -0.52 	EPISODE_LENGTH:  9
EPISODE:  6200 	REWARD:  -1.0 	MEAN_REWARD:  -0.5 	EPISODE_LENGTH:  9
EPISODE:  6300 	REWARD:  -1.0 	MEAN_REWARD:  -0.64 	EPISODE_LENGTH:  9
EPISODE:  6400 	REWARD:  -1.0 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  6500 	REWARD:  -1.0 	MEAN_REWARD:  -0.7 	EPISODE_LENGTH:  9
EPISODE:  6600 	REWARD:  -1.0 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  6700 	REWARD:  1.0 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  6800 	REWARD:  -1.0 	MEAN_REWARD:  -0.68 	EPISODE_LENGTH:  9
EPISODE:  6900 	REWARD:  -1.0 	MEAN_REWARD:  -0.72 	EPISODE_LENGTH:  9
EPISODE:  7000 	REWARD:  -1.0 	MEAN_REWARD:  -0.64 	EPISODE_LENGTH:  9
EPISODE:  7100 	REWARD:  -1.0 	MEAN_REWARD:  -0.52 	EPISODE_LENGTH:  9
EPISODE:  7200 	REWARD:  1.0 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  7300 	REWARD:  -1.0 	MEAN_REWARD:  -0.62 	EPISODE_LENGTH:  9
EPISODE:  7400 	REWARD:  1.0 	MEAN_REWARD:  -0.54 	EPISODE_LENGTH:  9
EPISODE:  7500 	REWARD:  -1.0 	MEAN_REWARD:  -0.48 	EPISODE_LENGTH:  9
EPISODE:  7600 	REWARD:  -1.0 	MEAN_REWARD:  -0.74 	EPISODE_LENGTH:  9
EPISODE:  7700 	REWARD:  -1.0 	MEAN_REWARD:  -0.58 	EPISODE_LENGTH:  9
EPISODE:  7800 	REWARD:  -1.0 	MEAN_REWARD:  -0.56 	EPISODE_LENGTH:  9
EPISODE:  7900 	REWARD:  -1.0 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  8000 	REWARD:  -1.0 	MEAN_REWARD:  -0.58 	EPISODE_LENGTH:  9
EPISODE:  8100 	REWARD:  -1.0 	MEAN_REWARD:  -0.36 	EPISODE_LENGTH:  9
EPISODE:  8200 	REWARD:  -1.0 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  8300 	REWARD:  -1.0 	MEAN_REWARD:  -0.58 	EPISODE_LENGTH:  9
EPISODE:  8400 	REWARD:  -1.0 	MEAN_REWARD:  -0.68 	EPISODE_LENGTH:  9
EPISODE:  8500 	REWARD:  -1.0 	MEAN_REWARD:  -0.46 	EPISODE_LENGTH:  9
EPISODE:  8600 	REWARD:  -1.0 	MEAN_REWARD:  -0.62 	EPISODE_LENGTH:  9
EPISODE:  8700 	REWARD:  -1.0 	MEAN_REWARD:  -0.68 	EPISODE_LENGTH:  9
EPISODE:  8800 	REWARD:  -1.0 	MEAN_REWARD:  -0.7 	EPISODE_LENGTH:  9
EPISODE:  8900 	REWARD:  -1.0 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  9000 	REWARD:  -1.0 	MEAN_REWARD:  -0.52 	EPISODE_LENGTH:  9
EPISODE:  9100 	REWARD:  -1.0 	MEAN_REWARD:  -0.46 	EPISODE_LENGTH:  9
EPISODE:  9200 	REWARD:  -1.0 	MEAN_REWARD:  -0.62 	EPISODE_LENGTH:  9
EPISODE:  9300 	REWARD:  -1.0 	MEAN_REWARD:  -0.62 	EPISODE_LENGTH:  9
EPISODE:  9400 	REWARD:  -1.0 	MEAN_REWARD:  -0.52 	EPISODE_LENGTH:  9
EPISODE:  9500 	REWARD:  1.0 	MEAN_REWARD:  -0.68 	EPISODE_LENGTH:  9
EPISODE:  9600 	REWARD:  -1.0 	MEAN_REWARD:  -0.68 	EPISODE_LENGTH:  9
EPISODE:  9700 	REWARD:  -1.0 	MEAN_REWARD:  -0.58 	EPISODE_LENGTH:  9
EPISODE:  9800 	REWARD:  -1.0 	MEAN_REWARD:  -0.44 	EPISODE_LENGTH:  9
EPISODE:  9900 	REWARD:  1.0 	MEAN_REWARD:  -0.7 	EPISODE_LENGTH:  9
EPISODE:  10000 	REWARD:  -1.0 	MEAN_REWARD:  -0.54 	EPISODE_LENGTH:  9
In [13]:
runner = Runner(
    agent = RLAgent(agent_config=catch_noise_config),
    env_id = environments.CATCH_NOISE,
    log_interval = LOG_INTERVAL
)
runner.play_episodes()
Loaded bsuite_id: catch_noise/1.
Logging results to CSV file for each bsuite_id in results.
EPISODE:  100 	REWARD:  -2.4319180485080194 	MEAN_REWARD:  -0.51 	EPISODE_LENGTH:  9
EPISODE:  200 	REWARD:  0.8881949106608489 	MEAN_REWARD:  -0.46 	EPISODE_LENGTH:  9
EPISODE:  300 	REWARD:  -1.2941227075745902 	MEAN_REWARD:  -0.61 	EPISODE_LENGTH:  9
EPISODE:  400 	REWARD:  -1.4250488015594018 	MEAN_REWARD:  -0.62 	EPISODE_LENGTH:  9
EPISODE:  500 	REWARD:  -0.18346899641600334 	MEAN_REWARD:  -0.55 	EPISODE_LENGTH:  9
EPISODE:  600 	REWARD:  0.39936420173707576 	MEAN_REWARD:  -0.39 	EPISODE_LENGTH:  9
EPISODE:  700 	REWARD:  -0.6115223148558895 	MEAN_REWARD:  -0.71 	EPISODE_LENGTH:  9
EPISODE:  800 	REWARD:  1.8585504709523746 	MEAN_REWARD:  -0.68 	EPISODE_LENGTH:  9
EPISODE:  900 	REWARD:  -0.637721569542689 	MEAN_REWARD:  -0.79 	EPISODE_LENGTH:  9
EPISODE:  1000 	REWARD:  -0.1701424555746447 	MEAN_REWARD:  -0.59 	EPISODE_LENGTH:  9
EPISODE:  1100 	REWARD:  -2.373967525284758 	MEAN_REWARD:  -0.57 	EPISODE_LENGTH:  9
EPISODE:  1200 	REWARD:  0.9410664532715053 	MEAN_REWARD:  -0.64 	EPISODE_LENGTH:  9
EPISODE:  1300 	REWARD:  0.14935897001644072 	MEAN_REWARD:  -0.49 	EPISODE_LENGTH:  9
EPISODE:  1400 	REWARD:  -0.46915825995716554 	MEAN_REWARD:  -0.4 	EPISODE_LENGTH:  9
EPISODE:  1500 	REWARD:  -1.0967027345203488 	MEAN_REWARD:  -0.76 	EPISODE_LENGTH:  9
EPISODE:  1600 	REWARD:  -0.590519575004099 	MEAN_REWARD:  -0.64 	EPISODE_LENGTH:  9
EPISODE:  1700 	REWARD:  0.4600847322894942 	MEAN_REWARD:  -0.32 	EPISODE_LENGTH:  9
EPISODE:  1800 	REWARD:  -1.2191277033889745 	MEAN_REWARD:  -0.65 	EPISODE_LENGTH:  9
EPISODE:  1900 	REWARD:  -1.3194958004662543 	MEAN_REWARD:  -0.57 	EPISODE_LENGTH:  9
EPISODE:  2000 	REWARD:  -1.0725519240660173 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  2100 	REWARD:  -0.8325406115721289 	MEAN_REWARD:  -0.65 	EPISODE_LENGTH:  9
EPISODE:  2200 	REWARD:  0.6554398892649567 	MEAN_REWARD:  -0.5 	EPISODE_LENGTH:  9
EPISODE:  2300 	REWARD:  -2.057203492145048 	MEAN_REWARD:  -0.63 	EPISODE_LENGTH:  9
EPISODE:  2400 	REWARD:  -0.8999610266905511 	MEAN_REWARD:  -0.75 	EPISODE_LENGTH:  9
EPISODE:  2500 	REWARD:  1.853698980224358 	MEAN_REWARD:  -0.33 	EPISODE_LENGTH:  9
EPISODE:  2600 	REWARD:  -2.0393710206345057 	MEAN_REWARD:  -0.58 	EPISODE_LENGTH:  9
EPISODE:  2700 	REWARD:  -0.7404188257945716 	MEAN_REWARD:  -0.59 	EPISODE_LENGTH:  9
EPISODE:  2800 	REWARD:  -0.7692911291296747 	MEAN_REWARD:  -0.51 	EPISODE_LENGTH:  9
EPISODE:  2900 	REWARD:  -1.4054059703103996 	MEAN_REWARD:  -0.63 	EPISODE_LENGTH:  9
EPISODE:  3000 	REWARD:  -0.5121112016246894 	MEAN_REWARD:  -0.62 	EPISODE_LENGTH:  9
EPISODE:  3100 	REWARD:  0.9063521251027006 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  3200 	REWARD:  -1.7154872204278544 	MEAN_REWARD:  -0.7 	EPISODE_LENGTH:  9
EPISODE:  3300 	REWARD:  -1.7543211466334183 	MEAN_REWARD:  -0.76 	EPISODE_LENGTH:  9
EPISODE:  3400 	REWARD:  -1.2468585698330004 	MEAN_REWARD:  -0.69 	EPISODE_LENGTH:  9
EPISODE:  3500 	REWARD:  -1.5777937653203873 	MEAN_REWARD:  -0.65 	EPISODE_LENGTH:  9
EPISODE:  3600 	REWARD:  -1.723954204367839 	MEAN_REWARD:  -0.55 	EPISODE_LENGTH:  9
EPISODE:  3700 	REWARD:  -1.3390600684584177 	MEAN_REWARD:  -0.55 	EPISODE_LENGTH:  9
EPISODE:  3800 	REWARD:  -0.27091604438057815 	MEAN_REWARD:  -0.62 	EPISODE_LENGTH:  9
EPISODE:  3900 	REWARD:  -1.6500551795439318 	MEAN_REWARD:  -0.44 	EPISODE_LENGTH:  9
EPISODE:  4000 	REWARD:  -1.518035850282325 	MEAN_REWARD:  -0.65 	EPISODE_LENGTH:  9
EPISODE:  4100 	REWARD:  -0.48998431399553455 	MEAN_REWARD:  -0.56 	EPISODE_LENGTH:  9
EPISODE:  4200 	REWARD:  -0.969230884506244 	MEAN_REWARD:  -0.58 	EPISODE_LENGTH:  9
EPISODE:  4300 	REWARD:  -1.2477601285199142 	MEAN_REWARD:  -0.55 	EPISODE_LENGTH:  9
EPISODE:  4400 	REWARD:  -1.5891299185866519 	MEAN_REWARD:  -0.63 	EPISODE_LENGTH:  9
EPISODE:  4500 	REWARD:  1.277402425761054 	MEAN_REWARD:  -0.59 	EPISODE_LENGTH:  9
EPISODE:  4600 	REWARD:  -0.7184732181373225 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  4700 	REWARD:  0.7473663956588822 	MEAN_REWARD:  -0.7 	EPISODE_LENGTH:  9
EPISODE:  4800 	REWARD:  -0.3698918119987854 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  4900 	REWARD:  -1.0392380395510732 	MEAN_REWARD:  -0.63 	EPISODE_LENGTH:  9
EPISODE:  5000 	REWARD:  -1.270801791857368 	MEAN_REWARD:  -0.7 	EPISODE_LENGTH:  9
EPISODE:  5100 	REWARD:  -1.1756397113535733 	MEAN_REWARD:  -0.66 	EPISODE_LENGTH:  9
EPISODE:  5200 	REWARD:  1.8405155250159377 	MEAN_REWARD:  -0.52 	EPISODE_LENGTH:  9
EPISODE:  5300 	REWARD:  -1.0996400208152906 	MEAN_REWARD:  -0.48 	EPISODE_LENGTH:  9
EPISODE:  5400 	REWARD:  -2.080813145889157 	MEAN_REWARD:  -0.55 	EPISODE_LENGTH:  9
EPISODE:  5500 	REWARD:  0.3140605100209716 	MEAN_REWARD:  -0.8 	EPISODE_LENGTH:  9
EPISODE:  5600 	REWARD:  0.3315518370959266 	MEAN_REWARD:  -0.69 	EPISODE_LENGTH:  9
EPISODE:  5700 	REWARD:  0.2096862353011535 	MEAN_REWARD:  -0.79 	EPISODE_LENGTH:  9
EPISODE:  5800 	REWARD:  1.8648164681308659 	MEAN_REWARD:  -0.53 	EPISODE_LENGTH:  9
EPISODE:  5900 	REWARD:  -0.8941115946662648 	MEAN_REWARD:  -0.63 	EPISODE_LENGTH:  9
EPISODE:  6000 	REWARD:  0.1357092896219496 	MEAN_REWARD:  -0.62 	EPISODE_LENGTH:  9
EPISODE:  6100 	REWARD:  -0.7309254534885832 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  6200 	REWARD:  0.5466495619954603 	MEAN_REWARD:  -0.52 	EPISODE_LENGTH:  9
EPISODE:  6300 	REWARD:  -0.4882044954685746 	MEAN_REWARD:  -0.74 	EPISODE_LENGTH:  9
EPISODE:  6400 	REWARD:  0.450981222842918 	MEAN_REWARD:  -0.52 	EPISODE_LENGTH:  9
EPISODE:  6500 	REWARD:  -0.6731780234883198 	MEAN_REWARD:  -0.49 	EPISODE_LENGTH:  9
EPISODE:  6600 	REWARD:  1.9792225266653973 	MEAN_REWARD:  -0.77 	EPISODE_LENGTH:  9
EPISODE:  6700 	REWARD:  0.691788652700822 	MEAN_REWARD:  -0.48 	EPISODE_LENGTH:  9
EPISODE:  6800 	REWARD:  -1.354949621801623 	MEAN_REWARD:  -0.59 	EPISODE_LENGTH:  9
EPISODE:  6900 	REWARD:  -0.07081143591767536 	MEAN_REWARD:  -0.51 	EPISODE_LENGTH:  9
EPISODE:  7000 	REWARD:  -1.663276262285267 	MEAN_REWARD:  -0.49 	EPISODE_LENGTH:  9
EPISODE:  7100 	REWARD:  -0.9622243214321896 	MEAN_REWARD:  -0.71 	EPISODE_LENGTH:  9
EPISODE:  7200 	REWARD:  -0.8099351901567722 	MEAN_REWARD:  -0.64 	EPISODE_LENGTH:  9
EPISODE:  7300 	REWARD:  -0.9427514611588996 	MEAN_REWARD:  -0.51 	EPISODE_LENGTH:  9
EPISODE:  7400 	REWARD:  -0.5468485095496791 	MEAN_REWARD:  -0.65 	EPISODE_LENGTH:  9
EPISODE:  7500 	REWARD:  -1.4522102247335968 	MEAN_REWARD:  -0.62 	EPISODE_LENGTH:  9
EPISODE:  7600 	REWARD:  1.076538639941028 	MEAN_REWARD:  -0.54 	EPISODE_LENGTH:  9
EPISODE:  7700 	REWARD:  -0.929063641819126 	MEAN_REWARD:  -0.72 	EPISODE_LENGTH:  9
EPISODE:  7800 	REWARD:  -1.1393340737331967 	MEAN_REWARD:  -0.57 	EPISODE_LENGTH:  9
EPISODE:  7900 	REWARD:  -1.1535034301045504 	MEAN_REWARD:  -0.5 	EPISODE_LENGTH:  9
EPISODE:  8000 	REWARD:  -0.5818375818660172 	MEAN_REWARD:  -0.61 	EPISODE_LENGTH:  9
EPISODE:  8100 	REWARD:  -0.7809850416392428 	MEAN_REWARD:  -0.56 	EPISODE_LENGTH:  9
EPISODE:  8200 	REWARD:  1.4738509039827317 	MEAN_REWARD:  -0.73 	EPISODE_LENGTH:  9
EPISODE:  8300 	REWARD:  1.0775841775272268 	MEAN_REWARD:  -0.52 	EPISODE_LENGTH:  9
EPISODE:  8400 	REWARD:  -1.702740565953817 	MEAN_REWARD:  -0.51 	EPISODE_LENGTH:  9
EPISODE:  8500 	REWARD:  -0.7009465059056705 	MEAN_REWARD:  -0.51 	EPISODE_LENGTH:  9
EPISODE:  8600 	REWARD:  -1.364529978006558 	MEAN_REWARD:  -0.49 	EPISODE_LENGTH:  9
EPISODE:  8700 	REWARD:  -0.8870475494792645 	MEAN_REWARD:  -0.45 	EPISODE_LENGTH:  9
EPISODE:  8800 	REWARD:  1.608483595505819 	MEAN_REWARD:  -0.5 	EPISODE_LENGTH:  9
EPISODE:  8900 	REWARD:  -0.9445590053772703 	MEAN_REWARD:  -0.55 	EPISODE_LENGTH:  9
EPISODE:  9000 	REWARD:  -0.6924726593407384 	MEAN_REWARD:  -0.65 	EPISODE_LENGTH:  9
EPISODE:  9100 	REWARD:  1.515640178361648 	MEAN_REWARD:  -0.59 	EPISODE_LENGTH:  9
EPISODE:  9200 	REWARD:  -1.2058750973156611 	MEAN_REWARD:  -0.47 	EPISODE_LENGTH:  9
EPISODE:  9300 	REWARD:  -1.0913898963564954 	MEAN_REWARD:  -0.48 	EPISODE_LENGTH:  9
EPISODE:  9400 	REWARD:  -0.3058544055084077 	MEAN_REWARD:  -0.75 	EPISODE_LENGTH:  9
EPISODE:  9500 	REWARD:  -0.47200925529187515 	MEAN_REWARD:  -0.65 	EPISODE_LENGTH:  9
EPISODE:  9600 	REWARD:  -0.7341138363723982 	MEAN_REWARD:  -0.6 	EPISODE_LENGTH:  9
EPISODE:  9700 	REWARD:  -1.207323251209016 	MEAN_REWARD:  -0.43 	EPISODE_LENGTH:  9
EPISODE:  9800 	REWARD:  1.875133614246674 	MEAN_REWARD:  -0.53 	EPISODE_LENGTH:  9
EPISODE:  9900 	REWARD:  -1.125095384156314 	MEAN_REWARD:  -0.49 	EPISODE_LENGTH:  9
EPISODE:  10000 	REWARD:  -1.678307591357688 	MEAN_REWARD:  -0.73 	EPISODE_LENGTH:  9
In [14]:
runner = Runner(
    agent = RLAgent(agent_config=cartpole_config),
    env_id = environments.CARTPOLE,
    log_interval = LOG_INTERVAL
)
runner.play_episodes()
Loaded bsuite_id: cartpole/0.
Logging results to CSV file for each bsuite_id in results.
EPISODE:  100 	REWARD:  43.0 	MEAN_REWARD:  38.8 	EPISODE_LENGTH:  44
EPISODE:  200 	REWARD:  42.0 	MEAN_REWARD:  39.2 	EPISODE_LENGTH:  43
EPISODE:  300 	REWARD:  43.0 	MEAN_REWARD:  39.12 	EPISODE_LENGTH:  44
EPISODE:  400 	REWARD:  36.0 	MEAN_REWARD:  38.89 	EPISODE_LENGTH:  37
EPISODE:  500 	REWARD:  38.0 	MEAN_REWARD:  39.03 	EPISODE_LENGTH:  39
EPISODE:  600 	REWARD:  37.0 	MEAN_REWARD:  38.99 	EPISODE_LENGTH:  38
EPISODE:  700 	REWARD:  38.0 	MEAN_REWARD:  38.93 	EPISODE_LENGTH:  39
EPISODE:  800 	REWARD:  39.0 	MEAN_REWARD:  39.04 	EPISODE_LENGTH:  40
EPISODE:  900 	REWARD:  34.0 	MEAN_REWARD:  40.08 	EPISODE_LENGTH:  35
EPISODE:  1000 	REWARD:  41.0 	MEAN_REWARD:  39.67 	EPISODE_LENGTH:  42
In [15]:
runner = Runner(
    agent = RLAgent(agent_config=cartpole_noise_config),
    env_id = environments.CARTPOLE_NOISE,
    log_interval = LOG_INTERVAL
)
runner.play_episodes()
Loaded bsuite_id: cartpole_noise/1.
Logging results to CSV file for each bsuite_id in results.
EPISODE:  100 	REWARD:  34.12552152047825 	MEAN_REWARD:  39.68 	EPISODE_LENGTH:  35
EPISODE:  200 	REWARD:  36.32320904835952 	MEAN_REWARD:  38.88 	EPISODE_LENGTH:  38
EPISODE:  300 	REWARD:  33.397737840661364 	MEAN_REWARD:  38.5 	EPISODE_LENGTH:  35
EPISODE:  400 	REWARD:  41.031172901491644 	MEAN_REWARD:  39.16 	EPISODE_LENGTH:  41
EPISODE:  500 	REWARD:  46.569164491861926 	MEAN_REWARD:  38.75 	EPISODE_LENGTH:  49
EPISODE:  600 	REWARD:  42.98468397914839 	MEAN_REWARD:  38.32 	EPISODE_LENGTH:  42
EPISODE:  700 	REWARD:  33.621666763205525 	MEAN_REWARD:  39.46 	EPISODE_LENGTH:  37
EPISODE:  800 	REWARD:  37.587526593164725 	MEAN_REWARD:  39.72 	EPISODE_LENGTH:  39
EPISODE:  900 	REWARD:  36.790664408259225 	MEAN_REWARD:  39.14 	EPISODE_LENGTH:  37
EPISODE:  1000 	REWARD:  40.631988076325094 	MEAN_REWARD:  38.95 	EPISODE_LENGTH:  40
In [16]:
runner = Runner(
    agent = RLAgent(agent_config=mountaincar_config),
    env_id = environments.MOUNTAINCAR,
    log_interval = LOG_INTERVAL
)
runner.play_episodes()
Loaded bsuite_id: mountain_car/0.
Logging results to CSV file for each bsuite_id in results.
EPISODE:  100 	REWARD:  -1000.0 	MEAN_REWARD:  -1000.0 	EPISODE_LENGTH:  1000
EPISODE:  200 	REWARD:  -1000.0 	MEAN_REWARD:  -1000.0 	EPISODE_LENGTH:  1000
EPISODE:  300 	REWARD:  -1000.0 	MEAN_REWARD:  -1000.0 	EPISODE_LENGTH:  1000
EPISODE:  400 	REWARD:  -1000.0 	MEAN_REWARD:  -1000.0 	EPISODE_LENGTH:  1000
EPISODE:  500 	REWARD:  -1000.0 	MEAN_REWARD:  -1000.0 	EPISODE_LENGTH:  1000
EPISODE:  600 	REWARD:  -1000.0 	MEAN_REWARD:  -1000.0 	EPISODE_LENGTH:  1000
EPISODE:  700 	REWARD:  -1000.0 	MEAN_REWARD:  -1000.0 	EPISODE_LENGTH:  1000
EPISODE:  800 	REWARD:  -1000.0 	MEAN_REWARD:  -1000.0 	EPISODE_LENGTH:  1000
EPISODE:  900 	REWARD:  -1000.0 	MEAN_REWARD:  -1000.0 	EPISODE_LENGTH:  1000
EPISODE:  1000 	REWARD:  -1000.0 	MEAN_REWARD:  -1000.0 	EPISODE_LENGTH:  1000
In [17]:
runner = Runner(
    agent = RLAgent(agent_config=mountaincar_noise_config),
    env_id = environments.MOUNTAINCAR_NOISE,
    log_interval = LOG_INTERVAL
)
runner.play_episodes()
Loaded bsuite_id: mountain_car_noise/1.
Logging results to CSV file for each bsuite_id in results.
EPISODE:  100 	REWARD:  -984.0074826301465 	MEAN_REWARD:  -998.95 	EPISODE_LENGTH:  1000
EPISODE:  200 	REWARD:  -1006.6954177370483 	MEAN_REWARD:  -999.26 	EPISODE_LENGTH:  1000
EPISODE:  300 	REWARD:  -999.4489016804749 	MEAN_REWARD:  -1000.41 	EPISODE_LENGTH:  1000
EPISODE:  400 	REWARD:  -1005.9196899050198 	MEAN_REWARD:  -1000.2 	EPISODE_LENGTH:  1000
EPISODE:  500 	REWARD:  -996.9460177617744 	MEAN_REWARD:  -1000.08 	EPISODE_LENGTH:  1000
EPISODE:  600 	REWARD:  -1005.6408451495033 	MEAN_REWARD:  -999.07 	EPISODE_LENGTH:  1000
EPISODE:  700 	REWARD:  -1002.8574841673754 	MEAN_REWARD:  -1000.17 	EPISODE_LENGTH:  1000
EPISODE:  800 	REWARD:  -1003.0486212221472 	MEAN_REWARD:  -999.14 	EPISODE_LENGTH:  1000
EPISODE:  900 	REWARD:  -1003.0506259195664 	MEAN_REWARD:  -1001.53 	EPISODE_LENGTH:  1000
EPISODE:  1000 	REWARD:  -992.5036966097842 	MEAN_REWARD:  -999.88 	EPISODE_LENGTH:  1000

Analysis & Result

The following cells will show the score of the agent on each environment. The same scoring method will be used to evaluate your agent on a set of test environments.

In [18]:
# *** PLEASE DONT EDIT THE CONTENTS OF THIS CELL ***
analyzer = Analyzer(os.environ.get('RESULTS_DIR'))
analyzer.print_scores()
╒════════════════════╀═══════════╕
β”‚ ENVIRONMENT        β”‚     SCORE β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═══════════║
β”‚ catch              β”‚ 0.00225   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ catch_noise        β”‚ 0.00325   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ cartpole           β”‚ 0.0195875 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ cartpole_noise     β”‚ 0.019507  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ mountain_car       β”‚ 0.1       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ mountain_car_noise β”‚ 0.1       β”‚
β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•β•β•›
In [19]:
# If you want a object to get the scores
analyzer.get_scores()
Out[19]:
{'cartpole': 0.019587499999999976,
 'cartpole_noise': 0.019507000000000004,
 'catch': 0.0022500000000000298,
 'catch_noise': 0.0032500000000000584,
 'mountain_car': 0.1,
 'mountain_car_noise': 0.1}

Backend Evaluation

THIS CODE WILL EVALUATE THE AGENT USING THE SPECIFIED CONFIGS FOR THE CORRESPONDING ENVIRONMENTS. DO NOT EDIT THE CONTENTS OF THIS CELL.

In [20]:
## Do not edit this cell
if (os.environ.get('BACKEND_EVALUATOR') is not None):
    
    import backend_evaluator

    runs = {
        'catch': (
            backend_evaluator.CATCH, 
            catch_config),
        'catch_noise': (
            backend_evaluator.CATCH_NOISE, 
            catch_noise_config),
        'cartpole': (
            backend_evaluator.CARTPOLE, 
            cartpole_config),
        'cartpole_noise': (
            backend_evaluator.CARTPOLE_NOISE, 
            cartpole_noise_config),
        'mountaincar': (
            backend_evaluator.MOUNTAINCAR, 
            mountaincar_config),
        'mountaincar_noise': (
            backend_evaluator.MOUNTAINCAR_NOISE, 
            mountaincar_noise_config)
    }

    for run_name, run in runs.items():
        env_ids, config = run
        for env_id in env_ids:
            runner = Runner(env_id=env_id,
                            agent=RLAgent(agent_config=config),
                            verbose=False,
                            eval=True)
            runner.play_episodes()

Submit to AIcrowd 🚀

NOTE: PLEASE SAVE THE NOTEBOOK BEFORE SUBMITTING IT (Ctrl + S)

In [ ]:
! aicrowd notebook submit --no-verify -c iitm-rl-final-project -a assets
In [ ]:

↕️  Read More


Comments

You must login before you can post a comment.