Loading
Round 1: 35 hours left Β· Ending 25 Oct 18:29 UTC Weight: 1.0

πŸ•΅οΈ Introduction

Consider a problem of a taxi driver, who serves six cities 0, 1, 2, 3, 4 and 5 which are located on a circular highway. The taxi driver can choose one of the following actions.

1. Cruise the streets looking for a passenger.
2. Go to the nearest taxi stand and wait in line. 3. Head back to headquarters (city 0).

Any passenger will only go to a preceding or succeeding city and the driver never cancels a ride. For a given city and any of the first two actions, there is a probability that the driver gets a passenger and goes to the succeeding or preceding city and there is also a probability that the driver doesn’t get any passenger and stays in the same city. Refer Table for the transition probabilities pkij which are represented as pkc where c = j βˆ’ i.

The rewards for outcomes 1 and -1 for actions 1 and 2 are shown in Figure. When the driver doesn’t get a passenger in a city (c=0), then the reward for the driver is 0. Also, if the driver decides to go to headquarters, the reward is 0.

Suppose 1 βˆ’ Ξ³ is the probability that the taxi will breakdown before the next trip. The driver’s goal is to maximize the total reward until his taxi breakdown

 

πŸ“ Task

Implement the following: (2+1.5 marks)

  • Find an optimal policy using policy iteration starting with a policy that will always cruise independent of the town, and a zero value vector. Let Ξ³ = 0.9.
  • Run policy iteration for discount factors Ξ³ ranging from 0 to 0.95 with intervals of 0.05 and display the results.

Answer the following (based on the data given above): (1+0.5 mark)

  • How is different values of Ξ³ affecting the policy iteration? Explain your findings.
  • Give alternate transition probabilities for action 2(if exists) such that optimal policy consists of action 2. Explain your answer.

You will be writing your solutions & making a submission through a notebook. You can follow the instructions in the starter notebook.

πŸ’Ύ Dataset

Under the Resources section you will find data files that contains parameters for the environment for this problem.

πŸš€ Submission

  • Submissions will be made through a notebook following the instructions in the starter notebook.
  • Each Team can make 5 successful submissions and 5 failed submissions in a day. Once the limit of failed submission is reached, the submission will be counted in the successful submission.
  • The submission limit will reset at 5:30 AM IST every day.
  • At the end of the challenge, you will have to select 1 submission as the final one. You can select that here.

πŸ“± Contact

  • RL TAs

 

Notebooks

See all
Starter notebook
By
siddhartha
About 2 months ago
0