Round 1: Completed #classroom Weight: 1.0

πŸ•΅οΈ Introduction

In this problem we have a  a taxi driver, who serves three cities A, B and C. On a regular workday, the taxi driver can find a new ride by choosing one of the following actions:

  1. Cruise the streets looking for a passenger.
  2. Go to the nearest taxi stand and wait in line.
  3. Wait for a call from the dispatcher (this is not possible in town B because of poor reception).

For a given town and a given action, there is a probability that the next trip will go to each of the towns A, B and C and a corresponding reward in monetary units associated with each such trip. This reward represents the income from the trip after all necessary expenses have been deducted. Please refer to the table below for the rewards and transition probabilities.

  • pkij is the probability of getting a ride to town j, by choosing an action k while the driver was in town i.
  • rkij is the immediate reward of getting a ride to town j, by choosing an action k while the driver was in town i.


  1. Implement the DP algorithm. Find the pseudocode below. Let S be the state space and A the action space.


  2. Tabulate the optimal policy and optimal value for each state in each round for N = 10.
  3. Consider a policy that always forces the driver to go to the nearest taxi stand, irrespective ofthe state. Is it optimal? Justify your answer.

You will be writing your solutions & making a submission through a notebook. You can follow the instructions in the starter notebook.

πŸ’Ύ Dataset

πŸ“ Files

Under the Resources section you will find data files that contains parameters for the environment for this problem.

πŸš€ Submission

Submissions will be made through a notebook following the instructions in the starter notebook.

πŸ“± Contact

  • RL TAs


See all
[Starter Notebook] RL - Taxi Problem
Over 3 years ago