#### Hockey Team Classification

# [Starter Notebook] RL - Taxi Problem

This a getting started notebook for the Taxi Problem in the RL course.

This is a getting starter notebook for the Taxi Problem. It contains basic instructions for using the notebook to make submissions as well as listed tasks to perform & questions to answer. Please read the instruction carefully and then proceed. You are required to create a copy of it before start playing with it.

Happy Solving!π

# What is the notebook about?¶

## Problem - DP Algorithm¶

This problem deals with a taxi driver with multiple actions in different cities. The tasks you have to do are:

- Implement DP Algorithm to find the optimal sequence for the taxi driver
- Find optimal policies for sequences of varying lengths
- Explain a variation on the policy

# How to use this notebook? 📝¶

- This is a shared template and any edits you make here will not be saved.
**You should make a copy in your own drive**. Click the "File" menu (top-left), then "Save a Copy in Drive". You will be working in your copy however you like.

**Update the config parameters**. You can define the common variables here

Variable | Description |
---|---|

`AICROWD_DATASET_PATH` |
Path to the file containing test data. This should be an absolute path. |

`AICROWD_RESULTS_DIR` |
Path to write the output to. |

`AICROWD_ASSETS_DIR` |
In case your notebook needs additional files (like model weights, etc.,), you can add them to a directory and specify the path to the directory here (please specify relative path). The contents of this directory will be sent to AIcrowd for evaluation. |

`AICROWD_API_KEY` |
In order to submit your code to AIcrowd, you need to provide your account's API key. This key is available at https://www.aicrowd.com/participants/me |

**Installing packages**. Please use the Install packages π section to install the packages

# Setup AIcrowd Utilities 🛠¶

We use this to bundle the files for submission and create a submission on AIcrowd. Do not edit this block.

```
!pip install -U git+https://gitlab.aicrowd.com/aicrowd/aicrowd-cli.git@notebook-submission-v2 > /dev/null
```

```
%load_ext aicrowd.magic
```

# AIcrowd Runtime Configuration 🧷¶

Define configuration parameters. Please include any files needed for the notebook to run under `ASSETS_DIR`

. We will copy the contents of this directory to your final submission file π

```
import os
AICROWD_DATASET_PATH = os.getenv("DATASET_PATH", os.getcwd()+"/40746340-4151-4921-8496-be10b3f8f5cf_hw2_q1.zip")
AICROWD_RESULTS_DIR = os.getenv("OUTPUTS_DIR", "results")
API_KEY = "" #Get your API key from https://www.aicrowd.com/participants/me
```

# Download dataset files 📲¶

```
!aicrowd login --api-key $API_KEY
!aicrowd dataset download -c rl-taxi
```

```
!unzip -q $AICROWD_DATASET_PATH
```

```
DATASET_DIR = 'hw2_q1/'
!mkdir {DATASET_DIR}results/
```

# Install packages 🗃¶

Please add all pacakage installations in this section

```
```

# Import packages 💻¶

```
import numpy as np
import os
# ADD ANY IMPORTS YOU WANT HERE
```

```
import numpy as np
class TaxiEnv_HW2:
def __init__(self, states, actions, probabilities, rewards):
self.possible_states = states
self._possible_actions = {st: ac for st, ac in zip(states, actions)}
self._ride_probabilities = {st: pr for st, pr in zip(states, probabilities)}
self._ride_rewards = {st: rw for st, rw in zip(states, rewards)}
self._verify()
def _check_state(self, state):
assert state in self.possible_states, "State %s is not a valid state" % state
def _verify(self):
"""
Verify that data conditions are met:
Number of actions matches shape of next state and actions
Every probability distribution adds up to 1
"""
ns = len(self.possible_states)
for state in self.possible_states:
ac = self._possible_actions[state]
na = len(ac)
rp = self._ride_probabilities[state]
assert np.all(rp.shape == (na, ns)), "Probabilities shape mismatch"
rr = self._ride_rewards[state]
assert np.all(rr.shape == (na, ns)), "Rewards shape mismatch"
assert np.allclose(rp.sum(axis=1), 1), "Probabilities don't add up to 1"
def possible_actions(self, state):
""" Return all possible actions from a given state """
self._check_state(state)
return self._possible_actions[state]
def ride_probabilities(self, state, action):
"""
Returns all possible ride probabilities from a state for a given action
For every action a list with the returned with values in the same order as self.possible_states
"""
actions = self.possible_actions(state)
ac_idx = actions.index(action)
return self._ride_probabilities[state][ac_idx]
def ride_rewards(self, state, action):
actions = self.possible_actions(state)
ac_idx = actions.index(action)
return self._ride_rewards[state][ac_idx]
```

# Examples of using the environment functions¶

```
def check_taxienv():
# These are the values as used in the pdf, but they may be changed during submission, so do not hardcode anything
states = ['A', 'B', 'C']
actions = [['1','2','3'], ['1','2'], ['1','2','3']]
probs = [np.array([[1/2, 1/4, 1/4],
[1/16, 3/4, 3/16],
[1/4, 1/8, 5/8]]),
np.array([[1/2, 0, 1/2],
[1/16, 7/8, 1/16]]),
np.array([[1/4, 1/4, 1/2],
[1/8, 3/4, 1/8],
[3/4, 1/16, 3/16]]),]
rewards = [np.array([[10, 4, 8],
[ 8, 2, 4],
[ 4, 6, 4]]),
np.array([[14, 0, 18],
[ 8, 16, 8]]),
np.array([[10, 2, 8],
[6, 4, 2],
[4, 0, 8]]),]
env = TaxiEnv_HW2(states, actions, probs, rewards)
print("All possible states", env.possible_states)
print("All possible actions from state B", env.possible_actions('B'))
print("Ride probabilities from state A with action 2", env.ride_probabilities('A', '2'))
print("Ride rewards from state C with action 3", env.ride_rewards('C', '3'))
check_taxienv()
```

# Task 1 - DP Algorithm implementation¶

Implement your DP algorithm that takes the starting state and sequence length and return the expected reward for the policy

```
def dp_solve(taxienv):
## Implement the DP algorithm for the taxienv
states = taxienv.possible_states
values = {s: 0 for s in states}
policy = {s: '0' for s in states}
all_values = [] # Append the "values" dictionary to this after each update
all_policies = [] # Append the "policy" dictionary to this after each update
# Note: The sequence length is always N=10
# ADD YOUR CODE BELOW - DO NOT EDIT ABOVE THIS LINE
# DO NOT EDIT BELOW THIS LINE
results = {"Expected Reward": all_values, "Polcies": all_policies}
return results
```

## Here is an example of what the "results" output from value_iter function should look like¶

Ofcourse, it won't be all zeros

```
{'Expected Reward': [{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0}],
'Polcies': [{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'}]}
```

```
if not os.path.exists(AICROWD_RESULTS_DIR):
os.mkdir(AICROWD_RESULTS_DIR)
```

```
# DO NOT EDIT THIS CELL, DURING EVALUATION THE DATASET DIR WILL CHANGE
input_dir = os.path.join(DATASET_DIR, 'inputs')
for params_file in os.listdir(input_dir):
kwargs = np.load(os.path.join(input_dir, params_file), allow_pickle=True).item()
env = TaxiEnv_HW2(**kwargs)
results = dp_solve(env)
idx = params_file.split('_')[-1][:-4]
np.save(os.path.join(AICROWD_RESULTS_DIR, 'results_' + idx), results)
```

```
##Β Modify this code to show the results for the policy and expected rewards properly
print(results)
```

# Task 2 - Tabulate the optimal policy & optimal value for each state in each round for N=10¶

Modify this cell and add your answer

# Question - Consider a policy that always forces the driver to go to the nearest taxi stand, irrespective of the state. Is it optimal? Justify your answer.¶

Modify this cell and add your answer

# Submit to AIcrowd 🚀¶

**NOTE: PLEASE SAVE THE NOTEBOOK BEFORE SUBMITTING IT (Ctrl + S)**

```
!DATASET_PATH=$AICROWD_DATASET_PATH aicrowd notebook submit -c rl-taxi -a assets
```

```
```

#### Content

#### Comments

You must login before you can post a comment.