Round 1: Completed Weight: 1.0

๐Ÿ•ต๏ธ Introduction

Consider a problem of a taxi driver, who serves six cities 0, 1, 2, 3, 4 and 5 which are located on a circular highway. The taxi driver can choose one of the following actions.

1. Cruise the streets looking for a passenger.
2. Go to the nearest taxi stand and wait in line. 3. Head back to headquarters (city 0).

Any passenger will only go to a preceding or succeeding city and the driver never cancels a ride. For a given city and any of the first two actions, there is a probability that the driver gets a passenger and goes to the succeeding or preceding city and there is also a probability that the driver doesnโ€™t get any passenger and stays in the same city. Refer Table for the transition probabilities pkij which are represented as pkc where c = j โˆ’ i.

The rewards for outcomes 1 and -1 for actions 1 and 2 are shown in Figure. When the driver doesnโ€™t get a passenger in a city (c=0), then the reward for the driver is 0. Also, if the driver decides to go to headquarters, the reward is 0.

Suppose 1 โˆ’ ฮณ is the probability that the taxi will breakdown before the next trip. The driverโ€™s goal is to maximize the total reward until his taxi breakdown


๐Ÿ“ Task

Implement the following: (2+1.5 marks)

  • Find an optimal policy using policy iteration starting with a policy that will always cruise independent of the town, and a zero value vector. Let ฮณ = 0.9.
  • Run policy iteration for discount factors ฮณ ranging from 0 to 0.95 with intervals of 0.05 and display the results.

Answer the following (based on the data given above): (1+0.5 mark)

  • How is different values of ฮณ affecting the policy iteration? Explain your findings.
  • Give alternate transition probabilities for action 2(if exists) such that optimal policy consists of action 2. Explain your answer.

You will be writing your solutions & making a submission through a notebook. You can follow the instructions in the starter notebook.

๐Ÿ’พ Dataset

Under the Resources section you will find data files that contains parameters for the environment for this problem.

๐Ÿš€ Submission

  • Submissions will be made through a notebook following the instructions in the starter notebook.
  • Each Team can make 5 successful submissions and 5 failed submissions in a day. Once the limit of failed submission is reached, the submission will be counted in the successful submission.
  • The submission limit will reset at 5:30 AM IST every day.
  • At the end of the challenge, you will have to select 1 submission as the final one. You can select that here.

๐Ÿ“ฑ Contact

  • RL TAs



See all
Starter notebook
About 1 year ago