Loading
Feedback

Workshop Summary: DroneRL

By debalb

The following blog post provides a summary of the DroneRL Reinforcement Learning workshop conducted at AMLD Lausanne and later at IIIT Hyderabad.

INTRODUCTION ๐Ÿ‘จโ€๐Ÿซ

Reinforcement Learning is a subdomain of Machine Learning, which is used to train models without using any labeled data set (as in Supervised Learning). In addition, unlike Unsupervised learning where the underlying goal is to determine a pattern in the data, Reinforcement Learning is used to develop autonomous agents which can interact and operate in a given environment without human intervention (think autonomous cars).

THE TASK ๐Ÿ“‹

Mastering RL usually requires going through heaps of theory, not to mention, hours of execution time with code. Organizers at the DroneRL workshop took on the challenge to simplify learning RL for beginners through a short problem-solving workshop wherein participants would not only grasp the basics of the topic but also solve a fun problem using the concepts learned.

Florian Laurent, a Machine Learning Engineer at AIcrowd and one of the organizers of the workshop at AMLD, explains the task:

โ€œ... so the problem itself is a multi agent problem. The idea is that you are given a drone and its control and this can move around in a grid world in a 2D space. The drone needs to pick up packages and perform deliveries. It can collide with other drones, which is something to avoid. It can collide in walls or in the skyscrapers. So the idea is to perform as many deliveries as possible without crashing.โ€

HOW IT WENT ๐Ÿง

The workshop at AMLD was organized in collaboration with the EPFL Extension School, as a full day event divided into two 4 hours sessions each in the morning and evening, whereas the IIIT-Hyderabad version had to be managed in just about half the duration. While the AMLD participants consisted of a varied mix, from experienced machine learning professionals to curious computer science engineers, the IIIT-Hyderabad workshop was conducted specifically for engineering students.

Right from the start, the organizers had an uphill task of introducing advanced concepts to an audience, many of whom were completely new to this domain. So the content had to be designed and presented in a way that would cover the necessary basics without being too overwhelming or complicated for the participants to lose interest.

As aptly summarized by Florian:

โ€œ... we needed a problem which was interesting but also simple enough, so that people who have some knowledge of machine learning in the beginning but no knowledge about reinforcement learning are able to actually get some results in.โ€

The organizers introduced the concepts in a phased manner starting with classical methods and then gradually moving onto more advanced topics like the DQN approach, Experience Prioritization, and Curiosity. In terms of implementation, pre-prepared notebooks on Google Colab were really popular among the participants. We appreciate the time spent by the organizers to prepare the resources for the participants in advance.

As explained by Florian:

โ€œ... the way that submissions were done is that people were training on Colab, so even if they had no notion of how to write code, they are able to just run through the notebook, download the model and upload it on AIcrowd and get the evaluation running โ€ฆ I think that Colab is very good for that because it's a good way to explain things on the screen, a good way for people to have access to compute without having to set up anything on their machines.โ€

How did the participants do? Were they able to get on board easily?
Here is what the organizers have to say.

Florian (AMLD):

โ€œ... they were definitely able to get on board in the sense that they were able to train good agents and get intuition about how to do that, how to train efficiently, how the different hyperparameters interplay between them.โ€

Pulkit Gera (organizer, IIIT-Hyderabad):

โ€œ... the scores initially were negative and once people started getting into positive scores, again the number of submissions started increasing like anything and it was really competitive at that stage. As I remember, people were running around, people were making educated guesses and saying okay let's change the rewards, let's change something like that โ€ฆ Once a positive score came about then people realized that this is how to approach this problem and this is what is needed to fix itโ€

LEADERBOARD RANKINGS ๐Ÿฅ‡

Submissions in both editions of the challenge were quite impressive. In fact, the winning team from IIIT-Hyd (mean reward 38.38) ended up beating the top score set by the AMLD winners (mean reward 12.45) by a huge margin. Check out the final leaderboard here.

WINNING STRATEGIES ๐ŸŽฏ

So what was the secret sauce and which approach worked best?
We put this question to the organizers.

Florian:

โ€œ โ€ฆ we were really impressed with the results the participants achieved. At the first workshop the top participants actually trained using curiosity and then at the second workshop in Hyderabad they beat the previous score but using only prioritized experience replay โ€ฆ In both iterations of the workshop, the top participants actually had a strategy, where they started to train early in the workshop for one solution and they kept training it for a very long period of time. For me this was the biggest surprise, that spending more time with a potentially simpler approach gave better results than what typically works in other reinforcement learning problemsโ€

Pulkit:

โ€œ... as far as I remember, they (the winning team) briefly explained their approach. Mostly what they did was that they changed the reward system, they penalized delivery more. So what they were doing was, they changed the reward system in a very intelligent way such that delivery was more penalized than picking up or something such they anyway picked up ... they did some intelligent hyperparameter tuning and were able to really get the best out of itโ€

HIGHLIGHTS ๐Ÿ‘

We ask the organizers for some of the most memorable moments from the event.

Florian:

โ€œ... it was an extremely challenging task. I mean we set up the whole grading system a day before with Mohanty. So we were able to set up the whole grading infrastructure and to scale it up to support literally hundreds of participants and submissions in a very short period of time. It would have been fully impossible to do that without AIcrowd. Given the time frame we had, it was very impressive to get everything up and running so fast.โ€

Pulkit:

โ€œ... the trend was that the solutions were gradually getting better and people were still figuring out what to do and what not to do ... suddenly this first year participant got this unbelievable score, I donโ€™t know what he did, I really want to ask him, but yeah he did get this score which really motivated everyone โ€ฆ Even a couple of days later, people were still submitting solutions. So that way it was a huge success, people were really into it and really got excitedโ€

FUTURE PLANS ๐Ÿ”ฎ

If you found this workshop exciting, wait till you find out what the organizers are planning for future events.

Florian:

โ€œ... have a full, like a real multi agent reinforcement learning problem, โ€ฆ so each participant only controls one of the drones whereas the other drones in the environment would actually be controlled by the submissions from other participants, โ€ฆ this I think could really bring the challenge to the next level โ€ฆ Another option that we considered initially is called Neural MMO developed by OpenAI and the idea is that it's more targeted at studying populations โ€ฆ there is this whole notion that each agent needs to find resources and develop territory, such kinds of things, โ€ฆ but the problem we had with that was to get it to run on Colab was actually very complicated โ€ฆ so we abandoned it at some point. But if this environment becomes more mature it would be interesting.โ€

Pulkit:

โ€œ... we are going very big on this one, these educational challenges, ... basically we want to create a ladder or a knowledge graph where people who have no idea in machine learning can also go about these challenges, make submissions, understand the nuances of machine learning and how to use it better as a tool. We are planning on having a lot of challenges posted there for learning and educational purposes.โ€

The future does look promising, we encourage the machine learning community to stay tuned on upcoming events at AIcrowd.

You may also like...