Winner's Perspective: Interview with Mugurel (Flatland)
Tell us a bit about yourself and how you came across the Flatland challenge?👨💼
I'm working as a software engineer. At work, I do coding, people management, also some ML. Competing in this kind of optimization challenges, this is just a hobby. I do it in the evening when I come back from work. I like competitive programming, particularly this kind of optimization challenge. I have been doing this already for many years and I became reasonably good at it.
At some point last year, I thought to check what other optimization challenges exist on the web. So, I just Googled and came across the SBB challenge from 2018 which was still on crowdAI, if I remember correctly. It seemed like a really nice challenge, and I wanted to see if they are organizing one this year.
So, I came across the Flatland Challenge on AIcrowd and it seemed very different from what the initial challenge was. Also, I saw that it's very targeted to reinforcement learning, agents and stuff. I was a bit skeptical if I should participate or not but I thought I'll give it a try.
Can you tell us a bit about your winning solution?🥇
I wrote all of it from scratch. I didn't really use any existing libraries. I coded the initial algorithm fully in Python. It was so excruciatingly slow that I said no. Finally, I just had an entry point to interact with the simulator in Python and I used this trick to call C++ code from python. So all of my solution except the thin wrapper that was in C++. So this made it easy, I just had a thin Python wrapper and it was passing some status data to the C++ part and the C++ part was passing back the actions that were sent back to the simulator.
Basically, the core of my solution is, you can plan over a state and the state is a location which is basically one of these cells on the board and a time stamp. It works well because even though the board can be 150 by 150, most of it is actually empty. So, it worked like that to literally do this kind of shortest path over this kind of time expanded graph.
I started with an idea for this, we should perform planning over time. But in this case I can't make all the planning in the beginning because there will be random malfunctions throughout the simulation. So then okay let's see if I can make something fast enough so that when a malfunction occurs, I can recompute this path.
Then I came across, I can do this, but once malfunctions occur, I could run into deadlocks. And this was the hardest part, because now I make a decision but then something changes and then this thing doesn't arrive at the right place at the right time and it's very hard to fix it afterwards.
So, this is the part where I thought, okay let's see if there is anything in the literature about this. There was this very basic idea that if there is some incident or malfunction somewhere, if you could delay all the agents to keep the same visiting order in every cell then you're guaranteed to avoid deadlock in the future. So, I said okay this was a good idea in the sense that if I could avoid deadlock now, then if I have a deadlock free set of paths, I can also improve them later.
Why do you think Reinforcement Learning based solutions did not work as well as Operations Research based approaches in this challenge?🧐
Given the space or the size of the problems or the test cases that exist for this challenge, I think it would have been very hard for Reinforcement Learning or any kind of learning to actually perform much better than other Operations Research type of algorithms because there are some specificities of this kind of challenge that made it hard if you made a mistake at some point.
In this simulator, you have to make choices at some point, for instance, if you want to get the train from one track to the next one, once you make the decision you can't stop the train. And if some slow train, let's say takes four turns to get there but the situation changes within these four turns, like there's some incident or some malfunction basically, that's very bad. It is very easy to get into the bugs if you don't do any kind of planning.
My guess is that with Reinforcement Learning, it would be very hard to avoid these kinds of bugs. But I didn't try any kind of Reinforcement Learning, so I can't tell for sure if they can be made much better or not.
Do you think a better solution would be a combination of Operations Research and Reinforcement Learning?🤔
I think that can be done. I guess some of the parameters of the optimization algorithms I could have learned by running many simulations and many test cases and those could have been learned. I couldn't do that because of the limited resources on my side. So, some of them were a bit manually tuned probably but not optimal.
What were the key things which worked well for you?💯
I guess what I learnt is that it's better to prioritize fast trains to get them out of the environment because then any malfunction on the fast train that's out doesn't impact anyone. Basically, malfunction seems to be random across all trains but if a train already reaches its destination, it doesn't matter. It cannot hurt anyone basically.
As long as the train is inside the environment we can always be hit by a malfunction and then this can cause all sorts of delays for others. If I can get as many trains out as early as possible that also makes it better for the other trains because there are fewer to block each other.
What kind of bugs did you encounter during your development?🐛
Towards the end of the competition I realized that I had a bug in my very initial solution, in my shortest path. My heap implementation that I wrote myself had a stupid bug when I was just removing things from the heap and you duplicate the things, you move some things out.
I only discovered it because at some point I decided to try to optimize that part a bit and I changed the way I use that heap a little. Then I realized that things were not working the way when I made these changes and I finally tracked this to this bug.
It was a very simple off-by-one error, in the sense that I had indexes from 1 to something in the heap but somehow somewhere in one place in the code, I assumed they were from zero to something. So then this really made everything very weird because I was having things that were duplicate. I was ending up with some things being removed without even being considered.
I didn't see it because it was still finding paths just that they were not as short and they were not as good. So, once I fixed this, the paths became shorter and that means more agents exited the environment sooner.
During the competition, would you prefer to have more communication between the team members?🗣
Personally I am all for discussing every idea that I had and making my code public, but only at the end of the competition. I don't want to do it during the competition. There is always this tradeoff between do you give up some good ideas that were working for you at the expense of maybe losing position on the leaderboard and then losing one of the actual prizes.
But after the competition, to me this is actually one of the motivating reasons to participate in some contests. I would like, if I don't do reasonably well, that the winners describe what their main idea is. So, I will learn something out of it.
Having stronger competitors motivates you at the top because if you don't keep trying things and improving then it's easy to just lose the spot, lose the prize money. Honestly, I didn't do it for the prize money. I participate because I like the fun of these optimization challenges. Money is just an added bonus but it's not the reason why I participate in these competitions.
Are you planning to continue working on that to further improve upon your solution?📈
I'll be honest, no not for this competition. For the next competition whenever that is, yes. The problem is that these ideas, they are not straightforward. They are a bit more complex at idea level so they need some amount of work. I need to come up with ideas so that they are not super hard to code, but they have to be smart enough to make a difference. So, I have to choose what I implement. I can't go crazy with everything if it takes a lot of time.
They will need, maybe, a lot of work, some important amount of work and I might have the time if I really wanted to but I don't have the motivation anymore. If there were many other competitors very close to my score, probably I would have forced myself and tried to implement some of these ideas. If there is ever a follow-up to this competition with some new rules, new simulator, then yeah maybe I can try them out in future contests.
Can you tell us some things that could be improved in the platform?😇
If I can summarize, there's some ramping up to be done, one on the AIcrowd infrastructure platform, for people like me who have never used Docker or Git basically. Also, the competition itself, with the actual simulator is not something easy that can just start. But once everything worked, I think it was really great.
I like the flexibility, this is about AIcrowd, that you can more or less take your own Docker container with whatever dependencies you want, whatever libraries you might want.So I think what's nice about AIcrowd is that you submit the code, you run the actual code.
Sometimes, it's not easy to get into a challenge. So the easier you make it to enter, the more people are going to participate.
Also some kind of rating system would help to show that I'm gaining some points. If people think they participate and not only do they get prizes but also get some extra points, they grow on the leaderboard of AIcrowd, this can become a kind of status symbol, like okay I am on the top something on AIcrowd. People would still think about it, it means that it's important.