Challenge Summary: Flatland

By  debalb

Following is a summary of the Flatland challenge 2019, which was organized by the Swiss Federal Railways (SBB) from Jul’19 to Jan’20.


Facing a need to increase the transportation capacity of their railways network by approximately 30% to meet future demands of passenger as well as goods transportation, the research group at SBB has developed a high-performance simulator which simulates the dynamics of train traffic as well as the railway infrastructure, in order to study different approaches for automated traffic management systems (TMS) with a goal to optimize traffic flow across the network.

At the core of this challenge lies the general vehicle re-scheduling problem (VRSP) which SBB aims to address by providing a simplistic grid world environment and allowing for diverse solution approaches. SBB is looking at experts in Machine Learning or even more traditional Operations Research for ideas that may shape the way modern traffic management systems (TMS) are implemented not only in railway but also in other areas of transportation and logistics.


If the challenge itself wasn’t exciting enough, surely the prizes announced by SBB, with a total value of 30k CHF (approx. 30k USD), were just the sort of motivation one needed.

Cash prizes awarded to the top three submissions were as follows:

  • CHF 7’500 for 1st prize
  • CHF 5’000 for 2nd prize
  • CHF 2’500 for 3rd prize

In addition, a Community Contributions Prize of CHF 5’000 was also offered for the biggest contribution to the community - done through generating and sharing new observations with the community.

To further encourage participation, SBB also announced that they would hand-pick and award up to 5 travel grants to the Applied Machine Learning Days 2019 in Lausanne, Switzerland and that participants with promising solutions may be invited to present their solutions at SBB in Bern, Switzerland.


The challenge, hosted on AIcrowd, consisted of 3 rounds spanning a duration of more than 5 months. It was officially kicked off on 01Jul'19 with a month long round-0 or beta round, aimed at getting an insight into the problem and the environment without any awards or rankings. SBB even provided an introduction tutorial to aid easier and faster on-boarding for beginners.

The real fun started in round-1, with added complexity coming from multi agent and conflict avoidance requirements. This round of the challenge began at the end of Jul'19 and went on till mid Oct'19. Around Aug'19, the organizers decided to modify the rules and announced that the results of this round would not be considered towards the final scores. Hence this round also turned out to be an exercise in exploring and figuring out ways around different problems posed by the challenge.

The organizers clearly saved the toughest challenge for round-2, starting mid Oct'19 and culminating on 05Jan'20. In this round, they introduced added complexity, in terms of varying agent speeds as well as malfunctions or breakdowns randomly affecting the agents running in the grid. Scoring was the same as in the previous rounds and based on the percentage of agents(trains) which successfully reached their destinations.


It was a test of creativity and savviness and most participants had a tough time in the last round, trying to get all the agents to their respective destinations. An optimized rescheduling algorithm seemed to be the key to ace this round.

In words of team CkUa, who ended up in 2nd place with best score of 0.96:

“Very important was to solve rescheduling. It was the hardest challenge in this problem. After malfunction all your plans are broken. So we made scheduling using a variant of train passing order for every cell. Already with this solution we were in 2nd place and we remained in 2nd place. In fact, on the last day, we even designed and implemented partial rescheduling. Although it added some points but not enough for 1st place.”

With respect to the submission process, while some participants might have found the docker based method difficult to grasp at first, there were many who were very appreciative of the process.

As per Mugurel, adjudged the winner with best score of 0.99:

“I think what's nice about AIcrowd is that you submit the code. So you run the actual code. Development wise that's one issue but it's like, I can run my models on a big cluster of machines and someone else can't afford for whatever reason to have the same computational resources. These are somewhat equal when you submit the submission on AIcrowd.”

Also from team CkUa:

“A very nice thing that only AIcrowd has is the submission by Git commit. It's really an interesting idea that you just push a tag and everything gets started somewhere, calculated and you receive a score. That was really nice. You are sure that your code works well on the organizer site.”


It is interesting to compare the leaderboards of round-1 and round-2 and one can easily see that the more complex round-2 had only the top 5 participants able to score 50% or more, whereas in round-1, more than 3 times as many teams were able to replicate this success.

This is something even the organizers expected to see. In words of the organizer Erik Nygren (SBB):

“In my opinion the problem required a mix of traditional OR based approach and machine learning algorithms like Reinforcement Learning. Any one method may not give the best solution and this was evident from the results that we saw in the competition. Some participants had prior experience solving such problems so naturally they scored better than others who relied on pure RL based approach. Naturally the latter group were left puzzled by how their leaderboard scores couldn't reach as high as the top ones.”

The Reinforcement Learning approaches struggled quite a bit to compete with more traditional Operations Research, Rescheduling and Heuristic search algorithms.

As confirmed by Mugurel:

“Given the space or the size of the problems or the test cases that exist for this challenge, I think it would have been very hard for reinforcement learning or any kind of learning to actually perform much better than other OR type of algorithms because there are some specificities of this kind of challenge that made it hard if you made a mistake at some point.”

However such results do not rule out the applicability of Reinforcement Learning to such kinds of problems. In fact both the top two teams agreed that a combination of ML and OR approaches could very well be the key to obtaining the most optimal solution, however, not without putting in a lot of effort.

In words of team CkUa:

“It's (ML and OR) not so easily combinable in fact. I could imagine that there can be possibilities to optimize, to cut with reinforcement learning some search process for example, and speed up with this. Or use reinforcement learning for rescheduling that we needed to order trains. I don't know, it's not so easy. What I would say is that parts of our A* search with rollbacks, can be useful when you start to make reinforcement learning. We had one idea on how to implement reinforcement learning in our A* search. We have some very easy heuristic function for determining which new position we will test. Maybe we can improve it with reinforcement learning. But it's just an idea, we didn’t have time to check it.”

While choosing the best performing submissions was pretty straightforward, the organizers struggled in deciding the winner for the Community Contributions Prize and finally settled on not awarding this prize to any team.

When asked about this aspect, Erik mentions:

“From the beginning of the challenge, we encouraged community contribution among participants through discussions, sharing of ideas/implementation by providing a separate category of awards for Community Contributions Prize. Unfortunately we did not see much of that happen during the competition. Maybe it was due to the competitive nature of the task itself and/or the pressure of timelines. I also realize that such an award is difficult to judge as there are no easy metrics for evaluation.”

The participants too had somewhat similar views on this:


“It's always like this trade off, between do you give up some good ideas that were working for you, at the expense of maybe losing position on the leaderboard and then losing one of the actual prizes. Personally I am all for discussing every idea that I had and making my code public, I don't care. But only at the end of the competition. I don't want to do it during the competition, that's my take. I prefer to give up on any potential community awards that may come out of it because it's unclear how they are awarded.”

Team CkUa:

“We never targeted it. Our solution, if you explain it even in high-level, good software developers will be able to repeat it. As far as we understand from talks at AMLD, we and the first place have very close solutions. So it was maybe an obvious idea with some tricks, he made one trick, we made some other trick. So I don't know what we can share. This is a hard trade-off here. On one hand you want these community goals and on the other hand nobody wants to share because they want to win.”


Community awards or not, tackling the challenge itself was a grilling task and it seemed the organizer’s choice of using Python for the framework did not make it any easier for the participants.

As team CkUa shared with us:

“Really it was a lesson for us not to use Python for such competitions. Because in the end it was a bottleneck for us as it was not fast enough. We understand that if we start on C++, we would still be able to check different possibilities and find better ways. We spent a lot of time on how to optimize our Python code. We tried a lot of different libraries, interpreters, but no chance to improve our code.”

Mugurel agrees, and he even considered re-implementing parts of the Python framework in C++:

“And at least in round-1, I coded the initial algorithm that I had, fully in Python. It was so excruciatingly slow that I said no. For me it didn't make sense at some point, because what I saw is that, just the simulator running and executing my actions was slower than me running my solution for basically every time step and so on. So on my machine, Python itself was the bottleneck. I even considered rewriting the simulator in C++ just to make it fast for me and be able to run more test cases. I thought maybe I'll miss something or maybe I’ll just spend a lot of time and then I do not know the exact same rules. So eventually I decided against it but I did think about it at some point.”

No doubt the debate between Python and C++ would rage on, with possibly the Reinforcement Learning practitioners favoring Python for the widely available RL algorithms that could be harnessed.

Another improvement aspect which both organizers and participants suggested was the need to have better discussion forums and platforms encouraging the exchange of ideas for such challenges.


“Another aspect to improve upon would be the way participants engage in discussions during the competition. I think the Discourse channel that we used lacked certain features such as direct chat between participants. Also the query-response interface can be made more intuitive and appealing. In general we would like to see more activity and exchange of ideas around the challenge happening in forums such as code sharing on platforms like Colab or Binder.”


“I don't know if there is too much activity on the forums after the competition to discuss such approaches, I mean on AIcrowd as a whole. But I think that would be something that again would make it more attractive, also potentially to more beginners.”

Team CkUa:

“What was not really working, I think, is the forum. It's working yeah, but nobody uses it and there is no possibility to answer to the exact person. It's a little bit messy and then nobody uses it.”


For now, the organizers seem satisfied with the outcome of this challenge and the future prospects that it might bring. As promised by the organizers, the winning teams were invited to present their ideas in the recently concluded AMLD conference at EPFL, Lausanne.

As Erik shared with us:

“There is a lot of positive feedback received internally around what we have achieved through these competitions. More departments are expressing interest to explore crowdsourcing as a means to solve some of their business problems. I am hopeful to see more such competitions being organized in the first half of 2021 which might even be related to a much broader transportation network and not just limited to railway traffic.”

This is surely good news for the participating community at large and also for the industry as it confirms the benefits of utilizing crowdsourcing in solving certain everyday business problems.

While it is true that the top ranking teams in this challenge had enough prior experience in solving such kind of problems, they are full of encouragement in their advice to fresh participants:

Team CkUa:

“You just need to take part in competitions. It's always good to do something by your hand even if you are a newbie. If you start to do, you learn a lot and then you know what to search and to try. And next time you don't start from zero, you start from some position and it's much easier. The best idea what other competitors can do is just to make the first submission. And then they will try to improve and improve. So making a first submission is the main thing to do”

We are hopeful that the next version of the challenge would be announced soon and we would see even more exciting solutions next time, perhaps even from first time participants.

Written by



You must login before you can post a comment.

You may also like...