Round 1: 17 hours left Β· Ending 01 Dec 18:29 UTC

RL Project 2021

Train your RL agents


πŸš€Starter kit - Everything you need to submit.

Note: This challenge accepts submission only via gitlab repo. Please read the submission instructions in the starter kit carefully before submitting.


In this project, your objective is to build an agent using Reinforcement learning, that maximises reward in different environments.  The following are the environments you will be working on.


The acrobot system includes two joints and two links, where the joint between the two links is actuated. Initially, the links are hanging downwards, and the goal is to swing the end of the lower link up to a given height.



This task was introduced in [Dietterich2000] to illustrate some issues in hierarchical reinforcement learning. There are 4 locations (labeled by different letters) and your job is to pick up the passenger at one location and drop him off in another. You receive +20 points for a successful dropoff, and lose 1 point for every timestep it takes. There is also a 10 point penalty for illegal pick-up and drop-off actions.


QuizBot for ABC (AI Banega Crorepati)

You have to design a bot for a popular quiz program called ABC, where you have an opportunity to win substantial amount of money if your bot is smart enough to answer a series of questions. There are N=16 questions in the quiz. For answering the k-th question correctly you recieve a hidden reward "r_k". You can only know how much total money you've earned at the end of the quiz. After every question we can stop the bot from running and leave with the accumulated reward, or we can continue on to the next question. ABC has been running for 3 seasons, for each season the rules are different as follows:

  • S1 : If you answer a question wrongly, you immediately lose all earned reward and leave the quiz with 0 money.

  • S2 : Due to feedback from the viewers, the producers of the show for this season included checkpoints, i.e. if you answer "k_0" questions correctly, where "k_0" is a fixed constant unknown to you, then you must win at least the sum of all these rewards no matter what happens subsequently. For instance, if k_0 = 6, and you answer the first 8 questions correctly and have a wrong answer on the 9th question, you get the reward of the first 6 correctly answered questions. If you get the first 5 correct and get the 6th one wrong, you are left with 0.

  • S3 : In order to spicy things up, the producers gave the contestants an option to choose an easy or hard question at every stage. The reward obtained for both of the questions is same, however, if you answer the easy question wrongly you leave the show with 0 money, whereas if you answer the hard question wrongly, you get half the reward you have earned till then. Note that, the probability of answering the easy question is higher than the hard one.

At each stage in the quiz, the questions progressively keep getting tougher, i.e. p_{i+1} < p_{i}, where p_k is the probability to get the k-th question correctly. Whereas the reward r_k is monotonically increasing. However, to reiterate, you are not told the amount you have one until the quiz terminates. You are required to create 3 different bots that can work well for the 3 different seasons.


Head on to the Starter kit and begin submitting. πŸš€