# MasterScrat

Florian Laurent

CH

0
1
2

Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Mon
Wed
Fri

#### Challenges Entered

##### NeurIPS 2020: Procgen Competition
By OpenAI

Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments

#### Latest submissions

 graded 67944 Fri, 5 Jun 2020 03:09:27 graded 67942 Fri, 5 Jun 2020 02:20:13
##### NeurIPS 2020: Flatland Challenge
By SNCF Deutsche Bahn SBB

Multi-Agent Reinforcement Learning on Trains

#### Latest submissions

 graded 88329 Wed, 14 Oct 2020 06:24:52 failed 88184 Tue, 13 Oct 2020 16:41:51 graded 88182 Tue, 13 Oct 2020 16:39:10
##### NeurIPS 2020: MineRL Competition
By MineRL Labs - Carnegie Mellon University

Sample-efficient reinforcement learning in Minecraft

#### Latest submissions

 graded 85625 Wed, 30 Sep 2020 12:23:51
##### NeurIPS 2019 : MineRL Competition
By MineRL Labs - Carnegie Mellon University

Sample-efficient reinforcement learning in Minecraft

#### Latest submissions

 graded 25410 Tue, 26 Nov 2019 09:05:50 graded 25409 Tue, 26 Nov 2019 09:02:37 failed 25399 Tue, 26 Nov 2019 08:02:34
##### Flatland Challenge
By SBB

Multi Agent Reinforcement Learning on Trains.

#### Latest submissions

 failed 67801 Wed, 3 Jun 2020 09:40:43 failed 67786 Tue, 2 Jun 2020 22:53:33 failed 67761 Tue, 2 Jun 2020 17:42:30
Gold 0
Silver 1
Boltzmann's Favourite
May 16, 2020
Bronze 2
Trustable
May 16, 2020
Newtonian
May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020
• Has filled their profile page
May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020
• Kudos! You've won a bronze badge in this challenge. Keep up the great work!
Challenge: NeurIPS 2019 : MineRL Competition
May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020

• May 16, 2020
• Kudos! You've been awarded a silver badge for this challenge. Keep up the great work!
Challenge: droneRL
May 16, 2020
Participant Rating
anssi 224
shivam
Participant Rating
manueth 0
anssi 224
junjie_li 301
• Elytra NeurIPS 2019 : MineRL Competition

### Submissions that failed with a broken pipeline error

6 minutes ago

Recent submissions that were counted have issue numbers: 88, 89, 90, 101, 102, 103

So from the list above I see that those have been counted but had “broken pipe” problem:

I have removed those from the counted submissions.

8 minutes ago

### Come back to Flatland pour le dernier round!

Hello Hector,

We regret not seeing you more after the Warm-Up Round!! You were one of the few participants to do “pure” RL approaches, and not OR with a touch of RL on top

You should really submit it to Round 2 - there’s still 17 days left! We’d also be really interested to hear more about your research directions.

Cheers,
Florian

Yesterday

Cheers,
Florian

### RE: Round 2 evaluation details

5 days ago

Hello, check this thread with similar question: How can I create a rail network, which is just a simple circle?

Your agent has control over when the trains start, that’s not part of the environment!

### Broken Pipeline error

32 minutes ago

@harshadkhadilkar, or anyone else who was affected from this outage: DM me the URLs of all your submissions that failed in this way and we will remove them from your count.

### Submit both RL and OR method

It’s now live! you can filter the leaderboard to see either RL or non-RL submissions.

### Fix environment setup

Yesterday

This looks good!

There’s no code to render the environment in single_agent_training.py to keep the code as simple as possible.

In the multiple agent training tutorial, we introduce the --render command-line flag which renders 1 episode in 100 in multi_agent_training.py, as well as utilities to save the trained policy.

After training a multi-agent policy, you can also render it in new environments using the evaluate_agent.py file with the --render flat.

### Fix environment setup

3 days ago

Did that fix it?

Otherwise, you need to reinstall the conda environment.

You first need to delete it:

conda info --envs # this will show you the paths of all your conda environments
rm -rf <path of flatland-rl env>


Then install it again:

conda env create -f environment.yml


Installing it afterward with pip can work too, but maybe some other requirements also failed to install the first time, so if you see more problems down the line it may be better to recreate it from scratch!

### Fix environment setup

3 days ago

Hey @marko, it looks like Flatland is not correctly installed in your conda env.

What do you see if you run:

conda activate flatland-rl
pip list | grep flatland


Did you see any error messages after running the previous command, conda env create -f environment.yml?

### Current status of imitation agent in baseline repository

5 days ago

Hey @milva,

So, we have some beautiful imitation learning machinery, with the ability to generate and persist expert demonstrations from top OR submissions, and also with the ability to figure out expert demonstrations on-the-fly (ie no need to create an expert demonstration dataset, you can just compute the best action dynamically). And there’s also a script to convert all that to RLlib format so you can scale up training.

Sadly, though, all this went through multiple versions and is very poorly documented as of right now! We’re aware of it and will try to improve this aspect as soon as we can…

You can maybe get some help from here: Recreating Malfunctions

Also if you tell us more precisely what you are trying to do (pure IL with RLlib?) we may be able to nudge you in the right direction in the meantime.

### AI Tags - how to correctly indicate the methods you use in a submission?

6 days ago

Repeating the above that newcomers may have missed:

You can select multiple tags out of “RL”, “OR” and “other” (case sensitive!) in your aicrowd.json files.

If your submission combines RL with OR, you should tag it "tags": ["RL", "OR"] and it will show up on the leaderboard as “RL + OR”.

A solution that combines RL and OR, using RL in a meaningful way, will be considered for RL prizes.

You could use “other” for example if you use evolution strategies, or some other method not generally considered as either OR or RL

### Submit both RL and OR method

8 days ago

There is ongoing work in this direction. @shivam can tell you more.

### How can I create a rail network, which is just a simple circle?

8 days ago

Hello @fhohnstein,

This is not a stupid question at all, actually creating such an environment wouldn’t be so easy!

So, one solution would be as you mentioned to use the rail_from_manual_specifications_generator generator, and to “draw” the environment yourself. But that would be complicated! Building such maps ny hand is not easy.

@hagrid67 had developed an editor system, I don’t know if it still works with recent Flatland releases?

In any case, you probably don’t need to create the environments by yourself. An easier way to create simple environments is to use the sparse railway generator and to play with its parameters.

If you want to run simple experiments, you can set the number of agents to 1, make the environment 25x25 and a low number of cities.

Check out this Colab notebook for a concrete example:

Notebook

Also, I am curious, you mention the old documentation (https://flatlandrl-docs.aicrowd.com/03_tutorials.html#simple-example-1-basic-usage). How did you end up on that page?

We have moved everything to the new doc, which contains the same information but more up to date (eg for that page: https://flatland.aicrowd.com/getting-started/env/level_generation.html#sparse-rail-generator). Is there an outdated link to the old doc somewhere?

### Round 2 evaluation details

8 days ago

Hey @beibei,

the different levels in one test have the same railway settings and same number of agents. The difference is the malfunction rate.

Correct!

Does it mean within one test,

• the railway networks (maps) are same
• he initial position and target position) for agents are same

No, the railway networks and initial positions and targets are different for every level, even within the same test.

The parameters within one test are fixed (except for the malfunction rate), but each environment is still procedurally generated from these parameters, which results in different maps for each environment.

• agents will be in malfunction in different time and with different time range

The rate of malfunction changes between the different environments within the same test. The maximum rate of malfunction (per agent) is max_mf_rate = 1.0 / min_malfunction_interval = 1.0 / 250.

You can see more in details how the malfunction rate changes within a test here: https://flatland.aicrowd.com/getting-started/environment-configurations.html#round-2

The malfunction time range is malfunction_duration = [20,50] for all the environments in all the tests (sampled uniformly).

My another concern is about the timesteps. When I evaluate locally, there is “Evaluation finished in *** timesteps…”. Does each environment (level) still have the timestep limit? Or the score is calculated based on the done agents and the timesteps? Besides, how do you calculate the total reward on the leaderboard? Is it the sum of the normalized reward in each environment?

Each environment does have it’s own timestep limit as in Round 1, that you can get from self.env._max_episode_steps. It is defined as int(4 * 2 * (env.width + env.height + num_agents / num_cities)) (see https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/schedule_generators.py#L188).

The score is calculated based on the done agents and the timesteps. We use the same normalized reward as in Round 1, but add 1.0 to make it between 0.0 and 1.0:

normalized_reward = 1.0 + sum_of_rewards / (self.env._max_episode_steps * self.env.get_num_agents())


And then indeed the total reward that counts for the leaderboard is the sum of the normalized reward for each environment.

You have more details here: https://flatland.aicrowd.com/getting-started/prize-and-metrics.html

And in the Round 2 announcement post: 🚂 Here comes Round 2!

And in the Round 2 environment configuration page: https://flatland.aicrowd.com/getting-started/environment-configurations.html#round-2

### Which tests are you reaching?

10 days ago

As a quick lower bound: each test has 10 environments, and each environment awards between 0.0 and 1.0 points. So the top submissions went through at least ~270 environments, or 27 tests!

Then the harder question is: what is the average score per environment?

If you want to get an idea of how well OR solutions perform you can experiment with last year’s 2nd place solution: https://flatland.aicrowd.com/research/top-challenge-solutions.html#second-place. It is pure Python and super fast.

See here for an explanation of their approach: https://www.youtube.com/watch?v=rGzXsOC7qXg&feature=youtu.be&t=724

13 days ago

13 days ago

### 🧰 Submitting RLlib baselines

13 days ago

We provide two types of baselines for this challenge:

• A simple DQN-based approach, which is now part of the starter kit: repo, doc
• Advanced methods such as Ape-X, CCPPO and IL using RLlib: repo, doc

So far, it took some work to submit solutions trained with RLlib.

Thanks to the efforts of our partners Deutsche Bahn and Instadeep, you can now submit the CCPPO baseline out of the box: https://gitlab.aicrowd.com/GereonVienken/db_flatland_example

This RL method reaches a score of 76.232 on the leaderboard!

You should be able to use the same approach with the other RLlib baselines as well. Thanks to our partners and especially to @GereonVienken who contributed this baseline and submission repository!

13 days ago

### Round 2 evaluation details

16 days ago

Hey @slopez!

• Why are environments grouped into tests?

All the environments in the same test have the same parameters: height, width, number of agents… the only difference is the malfunction rate. The performance of your submission is evaluated one Test at a time, and you need to have on average 25% of the trains reaching their destination to move on to the next Test. We’re interested in seeing up to what size a submission can perform well enough.

• From here, I infer that the only difference between levels in a same test is the malfunction interval, and tests differ by all other parameters, is this correct?

Correct! well not all the other parameters change, eg max_rails_in_city , malfunction_duration etc are all constant across all the levels.

• While evaluating locally, it seems that the evaluator service picks a random environment from all the environment tests. How does that work in the online evaluation, if the list of tests is infinite?

Note: Now that the environments are evaluated in order (from small to large), you should test your submissions locally in the same conditions. You can use the --shuffle flag when calling the evaluator to get a consistent behavior:

flatland-evaluator --shuffle False

• Is the score computed on all evaluated environments, independent from which test they come from?

The final score is the sum of the normalized return across all the evaluated environments, yes. So, 0.5 point in an environment in Test_0 is worth as much as 0.5 point in an environment in Test_30.

22 days ago

22 days ago

### 🚂 Here comes Round 2!

22 days ago

Round 2 is starting!

Many thanks to the participants who have experimented with Round 2 for the past weeks and helped us iron bugs out. Flatland is an active research project, and as such it is always a challenge to come up with a good problem definition and a stable evaluation setup.

Problem Statement in Round 2

In Round 1, your submissions had to solve a fixed number of environments within an 8 hours time limit.

Timeouts were a major challenge in Round 1: submissions would fail if the agents wouldn’t act fast enough, and it was hard to complete all the evaluation episodes within the time limit (especially for RL!).

In Round 2, we have done everything to make the experience smoother. Your submission needs to solve as many environments as possible in 8 hours. There are enough environments so that even the fastest of solutions couldn’t solve them all in 8 hours.

This removes a lot of problems:

• If your submission is slow, it will solve fewer episodes, but will still show up on the leaderboard.
• If your submission crashes at some point, you will still receive the points accumulated until the crash.
• If your submission takes too long for a given timestep, you won’t receive any reward for that episode but the evaluation will keep going (this was already the case at the end of Round 1).

The environments start very small, and have increasingly larger sizes. The evaluation stops if the percentage of agents reaching their targets drops below 25% (averaged over 10 episodes), or after 8h, whichever comes first. Each solved environment awards you points, and the goal is to get as many points as possible.

This means that the challenge is not only to find the best solutions possible, but also to find solutions quickly. This is consistent with the business requirements of railway companies: it’s important for them to be able to re-route trains as fast as possible when a malfunction occurs.

As in Round 1, the environment specifications are publicly accessible.

Note: Now that the environments are evaluated in order (from small to large), you should test your submissions locally in the same conditions. You can use the --shuffle flag when calling the evaluator to get a consistent behavior:

flatland-evaluator --shuffle False

Here’s what changed from Round 1:

• Submissions now have to solve as many environments as possible in 8 hours (see above).
• The time limits are now: 10 seconds per timestep, 10 minutes per pre-planning (double from Round 1)
• Evaluations will be interrupted after 10 consecutive timeouts (same as Round 1).
• The submission limits are now: 10 debug & 5 non-debug submissions per day (24h sliding window).
• Round 2 is starting late, as a result we moved the Round 2 end date to November 6th.

New Starter Kit - Submittable out of the box!

Writing your first submission can be a bit of a challenge: you need to get used to the AIcrowd submission system, list the correct software dependencies, make sure your code respects the time limits…

We have updated the starter kit: instead of a random agent, it now contains a fully functional PyTorch DQN RL agent that you can submit right away!

New Flatland Release

We have published a new release of flatland-rl: version 2.2.2.

It includes the improvements we have mentioned in previous posts:

Thanks to our partner nvidia, we are happy to announce some prizes for this challenge!

RL solutions:

• 1st prize: GeForce RTX 2080 Graphics Card
• 2nd prize: NVIDIA Jetson Nano Developer Kit

Other solutions:

• 1st prize: NVIDIA Jetson Nano Developer Kit
• 2nd prize: NVIDIA Jetson Nano Developer Kit

The original 4 travel grants to NeurIPS are replaced by travel grants to visit us at EPFL (Lausanne, Switzerland).

Known problems

• Since a few days, many previously working submissions appear to fail with build problems. This seems to be due to an update from an external dependency. See this thread: Build problem with the current environment.yml file. The new starter kit doesn’t have this issue.
• One known problem with the new evaluation setting: if your submission crashes and never calls remote_client.submit(), your score won’t appear on the leaderboard. It’s something we are investigating. Tag @aicrowd-bot on your submission if this happens to you and we’ll requeue it when then problem is fixed (requeues don’t count as additional submissions).
• More generally, if you made a Round 2 submission before today and you think it failed due to an evaluation bug, tag @aicrowd-bot and we will investigate/requeue it (if we haven’t already commented on it that it’ll be requeued).

23 days ago

### ⚡ Updated Starter Kit: a full DQN baseline you can submit out of the box!

23 days ago

TL:DR: We have updated the starter kit: it is now a full RL baseline that you can easily train and submit out of the box! https://gitlab.aicrowd.com/flatland/neurips2020-flatland-starter-kit

Making your first submission can be challenging: you need to get familiar with the Flatland APIs, discover the submission process, write out all your dependencies in the apt.txt and environment.yml files…
This can take a few tries to get right, and is not particularly satisfying!

To help with this process, we have updated the starter kit. It is now a full PyTorch-powered DQN baseline that you can submit right away.

You can train a full agent using the starter kit running everything on Colab. Colab is a free notebook service that allows you to run code in the cloud for free. You can even use GPUs if you want to experiment with larger networks!

Easy to tweak and extend

The training script exposes many parameters to quickly test hypotheses:

• Epsilon decay (start, end, and decay rate)
• Buffer size and min size before training starts
• Learning rate, gamma, tau…

See here for the full list of command line parameters: https://gitlab.aicrowd.com/flatland/neurips2020-flatland-starter-kit#sample-training-usage

Easy hyper-parameter tuning

You can use the (free) Weight & Biases service to log experiment results and to automate hyperparameter sweeps:

Documentation: https://docs.wandb.com/sweeps

Provided checkpoint

The starter kit comes with a sample checkpoint which should allow you to reach ~50 points on the leaderboard straight away. As of right now, this would put you in the top 15!

Old starter kit

The previous version is available in the old-starter-kit branch as a reference:

23 days ago

23 days ago

### Build problem with the current environment.yml file

23 days ago

Some recent package update seems to have broken the environment.yml file we provide in the starter kit.

We’re still investigating, but right now this minimal environment file is working (flatland-rl is part of the base image and doesn’t need to be added):

name: flatland-rl
channels:
- pytorch
- conda-forge
- defaults
dependencies:
- psutil==5.7.2
- pytorch==1.6.0
- pip==20.2.3
- python==3.6.8
- pip:
- tensorboard==2.3.0
- tensorboardx==2.1


### Round 2 evaluator service

You can use set the --shuffle parameter to False when calling the evaluator to order the environments.

To get the full Round 2 setup (infinite wave, new timeouts…) you should first update your pip version to the infinite_wave branch:

pip install -qq git+https://gitlab.aicrowd.com/flatland/flatland.git@infinite_wave


We’ll release it to pypi soon.

### 🏁 Round 1 has finished, Round 2 is starting soon!

Hey @wullli, we generated those from a Google Sheet, here are the original formulas:

It’s possible that we messed up the Google Sheet to Latex conversion, we’ll check it out.

### 🏁 Round 1 has finished, Round 2 is starting soon!

You should use the master version right now, new pip release coming soon

pip install git+https://gitlab.aicrowd.com/flatland/flatland.git

### 🏁 Round 1 has finished, Round 2 is starting soon!

I doubt this will be a problem, as the 8 hours time limit takes into account not only the time the agent takes to select its actions, but also the environment stepping time. So I think we will quickly reach a point where even a perfect agent that acts instantly would not have enough time to go through all the environments, due only to the stepping time.

But this is a good point, we should check that we have generated enough environments a few weeks before the deadline so we don’t have to add to them anymore!

FYI currently 43 tests of 10 environments are available during evaluation.

### 🚉 Questions about the Flatland Environment

That’s correct, yes!

The bad thing you really want to avoid is a deadlock: if you end up with two trains facing each other with no alternate paths, then they’re stuck for good (trains can’t go backward). They won’t be able to move again and will block the way.

### 🚉 Questions about the Flatland Environment

A cell can never contain more than one agent!

### 🏁 Round 1 has finished, Round 2 is starting soon!

The new test environments are now available for download from the Resource section!

The file test-neurips2020-round2-v0.tar.gz contains two environments per test for the first 41 Tests of Round 2.

### 5 seconds per timestep

Hey @beibei,

There are 3 things that can take time:

• figuring out your next actions from the current observations (inference)
• executing these actions in the environment (stepping)
• building the observations from the new environment state (observation building)

The timeout per timestep limits the time taken by inference and observation building, but not stepping.

So it’s time_taken_by_controller + time_taken_per_step - stepping_time.

To measure the stepping time separately, you can create the environment with a dummy observation builder and build the observations separately from the call to step(). See eg here: https://gitlab.aicrowd.com/flatland/flatland-examples/blob/make-submittable/run.py#L67 (observations are built explicitly line 138).

### 🚉 Questions about the Flatland Environment

Hey @beibei,

do the test_envs files contain the malfucntion and agents with different speed

Yes. Although, in this challenge, all the trains have a speed of 1.0.

I use rail_from_file , schedule_from_file and malfunction_from_file to load the grid map, schedule and malfunction then pass them to get the RailEnv().

Sounds good, you shouldn’t need anything else.

But during navigation, I can see some train with status ACTIVE doesn’t require action (malfunction == 0, speed == 1.0). Do you know what could be the issue?

Can you render some frames in such situations to see what may be happening?

### 🏁 Round 1 has finished, Round 2 is starting soon!

Sorry for the delay, setting up Round 2 and deciding on the right parameters for the new evaluation format was more challenging than we had anticipated!

We are almost there, and plan to launch Round 2 this week

### 🏁 Round 1 has finished, Round 2 is starting soon!

Hey @harshadkhadilkar, no the submissions are still open, I gave more details in the issue!

### Running a shell script as part of the image build process

Some participants are building parts of their solution (eg compiling C++ files) from their run.sh file.

While this works and is perfectly allowed, it is not optimal as this build step will take place during the evaluation, taking up precious time!

Instead, you can move these steps to a postBuild file. If you want this to be a shell script, make sure the first line is #!/bin/bash. It will be run as part of the image building process, after everything else. (The time spent building the image doesn’t count against the evaluation timeouts.)

See here for more details about the postBuild file: https://repo2docker.readthedocs.io/en/latest/config_files.html#postbuild-run-code-after-installing-the-environment

And see here to read more general documentation about AIcrowd’s image building process: How to specify runtime environment for your submission

2 months ago

### 🏁 Round 1 has finished, Round 2 is starting soon!

2 months ago

Round 1 has finished! Here are the winners of this first round
RL solutions:

• Team MARMot-Lab-NUS with -0.611
• Team JBR_HSE with -0.635
• Team BlueSky with -0.852

Other solutions:

• Team An_Old_Driver with -0.104
• Team MasterFlatland with -0.107
• Participant Zain with -0.116

Congratulations to all of them!

The competition is only getting started: anyone can still join the competition (Round 1 was not qualifying), and the prizes will be granted based on the results of Round 2.

When will Round 2 start? Can I still submit right now?
We are still hard at work on Round 2, which is expected to start sometimes this week. In the meantime, you can keep submitting to Round 1 to try out new ideas.

Now that Round 1 has officially finished, the leaderboard is “frozen”, and the winners listed above will keep their Round 1 positions whatever happens. But you can still see how your new submissions would rank by enabling the “Show post-challenge submissions” filter on the leaderboard:

Problem Statement in Round 2
In Round 1, your submissions had to solve a fixed number of environments within and 8 hours time limit.

In Round 2, things are a bit different: your submission will have to solve as many environments as possible in 8 hours. There are enough environments so that even the fastest of solutions couldn’t solve them all in 8 hours (and if that would happen, we’d just generate more).

The environments start very small, and have increasingly larger sizes. The evaluation stops if the percentage of agents reaching their targets drops below 25% (averaged over 10 episodes), or after 8h, whichever comes first. Each solved environment awards you points, and the goal is to get as many points as possible.

As in Round 1, the environment specifications will be publicly accessible.

This means that the challenge will not only be to find the best solutions possible, but also to find solutions quickly. This is consistent with the business requirements of railway companies: it’s very important for them to be able to re-route trains as fast as possible when a malfunction occurs!

Optimized Flatland environment
One of the most common frustration in Round 1 was the speed of the environment.

We have implemented a number of performance improvements. The pip package will be updated soon. You can already try them out by installing Flatland from source (master branch):

pip install git+https://gitlab.aicrowd.com/flatland/flatland.git

The improvements are especially noticeable in smaller environments. Here’s for example the time per episode while training a DQN agent in Test_0, using pip release 2.2.1 vs the current master branch:

(using DQN training code from here: https://gitlab.aicrowd.com/flatland/flatland-examples)

Train Close Following
As some of you have noticed during Round 1, the current version of Flatland makes it hard to move trains too close from one another. You usually need to keep an empty cell between two trains, or to take their ID into account to make sure they can follow each other closely.

This limitation has been lifted. The new motion system is also available in the master branch. See here for a detailed explanation of what it means, how it can help you, and how it was implemented: https://discourse.aicrowd.com/t/train-close-following

More coming soon…

2 months ago

2 months ago

### 🚃🚃 Train Close Following

2 months ago

TL:DR: We have improved the way agent actions are resolved in Flatland, by fixing corner cases where trains had to leave an empty cell between each others. This new way to handle actions is the new standard, and will be used for Round 2.

Many of you are aware that Flatland agents cannot follow each other close behind, unless they are in agent index order, ie Agent 1 can follow Agent 0, but Agent 0 cannot follow Agent 1, unless it leaves a gap of one cell.

We have now provided an update which removes this restriction. It’s currently in the master branch of the Flatland repository. It means that agents (moving at the same speed) can now always follow each other without leaving a gap.

Why is this a big deal? Or even a deal?
Many of the OR solutions took advantage of it to send agents in the “correct” index order so that they could make better use of the available space, but we believe it’s harder for RL solutions to do the same.

Think of a chain of agents, in random order, moving in the same direction. For any adjacent pair of agents, there’s a 0.5 chance that it is in index order, ie index(A) < index(B) where A is in front of B. So roughly half the adjacent pairs will need to leave a gap and half won’t, and the chain of agents will typically be one-third empty space. By removing the restriction, we can keep the agents close together and so move up to 50% more agents through a junction or segment of rail in the same number of steps.

What difference does it make in practice?
We have run a few tests and it does seem to slightly increase the training performance of existing RL models.

Does the order not matter at all now?
Well, yes, a bit. We are still using index order to resolve conflicts between two agents trying to move into the same spot, for example, head-on collisions, or agents “merging” at junctions.

This sounds boring. Is there anything interesting about it at all?
Thanks for reading this far… It was quite interesting to implement. Think of a chain of moving agents in reverse index order. The env.step() iterates them from the back of the chain (lowest index) to the front, so when it gets to the front agent, it’s already processed all the others. Now suppose the front agent has decided to stop, or is blocked. The env needs to propagate that back through the chain of agents, and none of them can in fact move. You can see how this might get a bit more complicated with “trees” of merging agents etc. And how do we identify a chain at all?

We did it by storing an agent’s position as a graph node, and a movement as a directed edge, using the NetworkX graph library. We create an empty graph for each step, and add the agents into the graph in order, using their (row, column) location for the node. Stationary agents get a self-loop. Agents in an adjacent chain naturally get “connected up”. We then use some NetworkX algorithms:

• weakly_connected_components to find the chains.
• selfloop_edges to find the stopped agents
• dfs_postorder_nodes to traverse a chain
• simple_cycles to find agents colliding head-on

We can also display a NetworkX graph very simply, but neatly, using GraphViz (see below).

Does it run faster / slower?
It seems to make almost no difference to the speed.

How do you handle agents entering the env / spawning?
For an agent in state READY_TO_DEPART we use a dummy cell of (-1, agent_id) . This means that if several agents try to start in the same step, the agent with the lowest index will get to start first.

Thanks to @hagrid67 for implementing this improved movement handling!

### Submission Error run.sh with Baselines Repo included

2 months ago

Hey @fabianpieroth, could you point us to a Gitlab issue where this problem occurs?

2 months ago

Yes, you can keep your submission “close”, in which case the organizers will review your code, but you won’t have to make it public. But in that case, you won’t be eligible for prizes. You will still keep your rank in the leaderboard, but your prize will be given to the next best team (and their price will be given to the following best team, etc).

### 🚑 Addressing Round 1 pain points

3 months ago

@junjie_li I have edited the original message yesterday, it may not have been visible enough, sorry about that! We have extended Round 1 by one week:

### 🚑 Addressing Round 1 pain points

3 months ago

Since the problem with debug submissions counting as much as full submissions is still awaiting a fix, we have updated the submission quota to 7 per day until the end of the round.

3 months ago

Hey @kirill_ershov, yes we will announce a team merging deadline when we start Round 2.

### 🚑 Addressing Round 1 pain points

3 months ago

Hey @junjie_li,

Quoting from your post here 🧞 Pain points in Round 1 and wishes for Round 2? :

My wishes for Round 2 are:

• Use only a few large test cases(for example, # of test cases <= 10), while keep same overall running time. It may be even better to test with same grid size.
• Use same speed for different agents. I personally prefer to focus more on RL related things, instead of dealing with dead-lock from different speeds.

I think one of OR’s shortage is that it’s not straightforward to optimize for global reward.
My understanding: RL’s advantage is finding a better solution(combining with OR), but not acting in a shorter time.
If we want to see RL performan better than OR, we should give RL enough time for planning/inference on large grid env. (both 5 min and 5s may not be enough for RL to do planning and inference. )

I think I understand your point of view. Indeed, by focusing on a few large environments with RL, the global reward could be better than using OR, as RL can explicitly optimize the reward.
Did I understand your point correctly?

However, the business problem is different. In the real world, OR methods are already very good at finding optimal solutions. The problem is that they take too long to calculate these solutions, and the calculation time explodes with the size of the railway network. This is especially a problem when a train breaks down: people are waiting, so a solution should really be found as fast as possible, even if it’s not completely optimal.

This is why we are introducing this “time vs score” trade-off: in practice it may be more useful to have a sub-optimal solution that allows the trains to start moving after a few minutes of calculations, rather than having to wait an hour before finding a perfect solution. Similarly in Round 2 your solution can be faster but come up with solutions which are not perfect, but still potentially accumulate more points.

We are hoping that RL can help move the needle here, as the agents could potentially keep moving without having to calculate a full planning until the end, therefore finding an approximate solution faster!

3 months ago

3 months ago

### 🚑 Addressing Round 1 pain points

3 months ago

Thanks you everyone for your feedback on Round 1! Here’s a summary of the problems encountered so far, and how we plan to address them.

TL:DR: Round 2 will be similar to Round 1 but with many more environments. The 8 hours overall time limit won’t cause submissions to fail anymore. Prizes will be announced soon. Reported bugs are being fixed. Round 2 is pushed back by one week while we address all the feedback.

EDIT: We are still hard at work addressing issues from Round 1 and preparing Round 2. To make sure everything goes well when we start the next round, we are pushing Round 2 back by an extra week (to August 14th).

The 8 hours overall time limits is too strict!
This is the most common problem: it’s very hard to get an RL solution to finish in time.

To fix this, we will make this time limit a “soft timeout”: if your submission takes more than 8 hours, it won’t be cancelled anymore, but instead all the remaining episodes that is didn’t have time to solve will receive a score of -1.0.

To make this process fair, the order of the evaluation environments will be fixed. The environments will also be ordered in increasing order of size.

The environment is too slow
The Flatland environment does get slow when running larger environments!

This is a problem in two situations. First, for submissions: in this case it could push solutions over the 8 hours overall time limit. Now that this time limit will be “soft”, this won’t be such a big problem anymore. Yes, the environment will still take a large chunk of the time during the evaluation process. But your submission will be valid even if it takes too long, and the environment takes the same amount of time for all participants, so things are fair.

Still, the speed of the environment limits how fast you can train new agents and experiment with new ideas. We will release a new version that includes a number of performance improvements to alleviate this issue for Round 2.

I don’t want people to see videos of my submissions
Some participants have expressed the wish to hide their submissions videos.

This is not something we plan to provide. Our goal is to foster open and transparent competition, and showing videos is part of the game: participants can glean some information from them to get new ideas.

One strategy would be to wait for the last minute to “hide your hand”. This is possible, but can be risky, as the number of submissions per day is limited, so it is generally better to secure a good position on the leaderboard as soon as possible!

We still don’t know what the prizes will be!
The original prizes were travel grants to NeurIPS - but sadly the conference will be fully virtual this year.

This forced us to look again for new sponsors for the prizes. While we can’t announce anything yet, things are progressing, and we’re hoping to announce exciting prizes by the time Round 2 starts.

The margin of progression for OR is too small 💇
OR solutions reached 100% of completion rate in a matter of days in Round 1, and are now fighting over thousandth of points. Since the overall time limit is now “soft”, we will simply add many more evaluation episodes including much larger environments to allow a larger margin of progression for all solutions.

Documentation is still lacking
Flatland is a complex project that has been developed by dozens of people over the last few years. We have invested a lot of energy to gather all the relevant information at flatland.aicrowd.com, but we realise there is still a lot of work ahead.

We will keep working on this, but this is a large task where your contribution is more than welcome. Contributing to the documentation would make you an official Flatland Contributor! Check out https://flatland.aicrowd.com/misc/contributing.html to see how you can help.

Various bugs are making our lives harder
Here’s a list of known bugs we plan to squash before Round 2 starts:

• Debug submissions count the same a full submissions

• When a submission is done, the percentages and other metrics reported in the Gitlab issues are non-sensical (“-11.36% of agents done")

• Rendering bug showing agents in places where there shouldn’t be

We’re hard at work to address all these issues. We have moved the starting date of Round 2 one week back to give us time to implement and deploy all the necessary changes.

We’re still open to comments, complaints and requests! Please fill up the survey if you haven’t done so:

### I'm getting "git@gitlab.aicrowd.com: Permission denied (publickey)"

3 months ago

Great! I’ll edit the documentation to clarify this point.

### I'm getting "git@gitlab.aicrowd.com: Permission denied (publickey)"

3 months ago

Oh, you need to add the key to gitlab.aicrowd.com, not to gitlab.com! gitlab.aicrowd.com is our own instance of Gitlab.

So, in the instructions, you should replace “gitlab.com” with “gitlab.aicrowd.com”.

### I'm getting "git@gitlab.aicrowd.com: Permission denied (publickey)"

3 months ago

Did you successfully add an SSH key to your account, as described here?

### 🧞 Pain points in Round 1 and wishes for Round 2?

3 months ago

Regarding the 8 hour time limit, would it solve the issue if this time limit would not cancel the submission when it takes too long, but would instead give a score of -1.0 to all the environments that have not been solved in time?

Did you have problems with the 5 min and 5 seconds time limits? What do you think would be reasonable time limits to use instead?

@junjie_li I understand that these two points are making things harder:

• large variety of environments
• potentially different train speeds in Round 2

However, these are part of the business problem SBB and DeutscheBahn are facing and that we are trying to solve. We need to strike a balance between making the challenge feasible/interesting, and keeping it close enough to the real-world problem so results are useful!

### 🧞 Pain points in Round 1 and wishes for Round 2?

3 months ago

The trick is to use a dummy observation builder, which takes no time, and to build the observations by calling the actual observation builder yourself when needed by calling observation_builder.get_many()

3 months ago

### How to use TreeObsForRailEnv in remote client?

3 months ago

It is still not clear to me what this error is about, we could look more into it if we had a code sample or link to a gitlab issue where it occurs.

### Timeout in submission

3 months ago

Hey @jiaxun_cui, this is not currently possible.

### Question about round 1 -> round 2

3 months ago

This is correct!

The goal of Round 1 is to fine-tune the problem definition. Only Round 2 will matter for the prize.

3 months ago

### 🧞 Pain points in Round 1 and wishes for Round 2?

3 months ago

With 7 days to go in Round 1, what have been the major pain points so far? What would you want to see improved in Round 2?

Edit: fill up the survey to help us understand what we can improve!

### Timeout in submission

3 months ago

Hello @antoinep, indeed the environment is slow which is a problem for many submissions, especially the RL ones.

We are working on different solutions and will make sure this is handled better in Round 2.

For now, the most efficient solution would be to “pick your battles”. If your solution is too slow to solve all 400 episodes, you can chose to only solve some of them.

While there’s no way to “skip” episodes, what you can do is perform “no-ops” during some of the episodes. If you perform steps with no actions for the whole episode (ie env.step({})), you will very quickly reach the end of that episode. Of course you will get a score of -1.0 for this episode, but this will allow you to finish the evaluation in time.

For example, you could start by only using your RL policy for environments with 50 agents or less (you can see the environment configurations here). For all other environments, you just perform no-ops until they’re over. If you see your solution is fast enough this way, then you can tackle more environments eg up to 80 agents.

There are other ways to speed up your policy, eg running the inference in parallel, keeping a cache of {state -> action} etc, but skipping some episodes will let you make a successful submission more easily in any case.

### 🚉 Questions about the Flatland Environment

3 months ago

That’s weird! How are you parallelising it? We use dozens of environments in parallel in the RLlib baselines: https://flatland.aicrowd.com/research/baselines.html

### 🚉 Questions about the Flatland Environment

3 months ago

Hey @shining_spring,

Indeed, malfunction_duration = [20,50] specifies the min/max of the malfunction_duration. This value is the same for all Round 1 environments.

min_malfunction_interval is the minimal interval between malfunctions.

The malfunction_rate is the invert of the malfunction interval. So the malfunction rate will be at most 1.0 / min_malfunction_interval.

### Baseline for other algorithms

3 months ago

I’m not familiar with these methods. Beyond the RL baselines, one OR method has been documented: https://flatland.aicrowd.com/getting-started/or.html

And then there are also the top solutions from last year: https://flatland.aicrowd.com/research/top-challenge-solutions.html

### 🚉 Questions about the Flatland Environment

3 months ago

@seungjae_ryan_lee yes remove_agents_at_target  is True during evaluation!

### Get n_city from RailEnv

3 months ago

Hey @kirill_ershov, no, annoyingly you can’t get that number from the environment with the current version.

This is an open bug in Flatland: https://gitlab.aicrowd.com/flatland/flatland/issues/324

A dirty way may be to use the formula for the max number of timesteps?

# flatland/envs/schedule_generators.py:174
timedelay_factor = 4
alpha = 2
max_episode_steps = int(timedelay_factor * alpha * (rail.width + rail.height + num_agents / len(city_positions)))


You know the values of rail.width, rail.height and num_agents so you could recover len(city_positions)

### Optimization opportunities in the Flatland environment

3 months ago

Here are some potential optimizations in the Flatland environment discovered by Adrian Egli from SBB. They will eventually be integrated in the Flatland codebase, but you are already welcome to take advantage of them.

If you do test and integrate them, you are encouraged to submit PRs to the Flatland repository, which would make you a Flatland contributor!

#---- SpedUp ~7x -----------------------------------------------------------------------------------------------------
#ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#109161    0.131    0.000    0.131    0.000 grid4_utils.py:29(get_new_position)
MOVEMENT_ARRAY = [(-1, 0), (0, 1), (1, 0), (0, -1)]
def get_new_position(position, movement):
return (position[0] + MOVEMENT_ARRAY[movement][0], position[1] + MOVEMENT_ARRAY[movement][1])
#---- ORIGINAL -----------------------------------------------------------------------------------------------------
#ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#112703    0.893    0.000    1.355    0.000 grid4_utils.py:32(get_new_position)
def get_new_position(position, movement):
""" Utility function that converts a compass movement over a 2D grid to new positions (r, c). """
if movement == Grid4TransitionsEnum.NORTH:
return (position[0] - 1, position[1])
elif movement == Grid4TransitionsEnum.EAST:
return (position[0], position[1] + 1)
elif movement == Grid4TransitionsEnum.SOUTH:
return (position[0] + 1, position[1])
elif movement == Grid4TransitionsEnum.WEST:
return (position[0], position[1] - 1)

#---- SpeedUp ~3x ...............................................................
#ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#27121    0.041    0.000    0.273    0.000 grid4.py:66(get_transitions)
from numba import njit,jit

def get_transitions(self,cell_transition, orientation):
return opt_get_transitions(cell_transition,orientation)

@jit()
def opt_get_transitions(cell_transition, orientation):
"""
Get the 4 possible transitions ((N,E,S,W), 4 elements tuple
if no diagonal transitions allowed) available for an agent oriented
in direction orientation and inside a cell with
transitions cell_transition.
Parameters
----------
cell_transition : int
16 bits used to encode the valid transitions for a cell.
orientation : int
Orientation of the agent inside the cell.
Returns
-------
tuple
List of the validity of transitions in the cell.
"""
bits = (cell_transition >> ((3 - orientation) * 4))
return ((bits >> 3) & 1, (bits >> 2) & 1, (bits >> 1) & 1, (bits) & 1)

#---- ORIGINAL -----------------------------------------------------------------------------------------------------
#ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#25399    0.146    0.000    0.146    0.000 grid4.py:66(get_transitions)
def opt_get_transitions(self, cell_transition, orientation):


I think we could use numba to increase the performance. Especially for all pure numpy and python methods which can be made “static”.

3 months ago

### 🚉 Questions about the Flatland Environment

3 months ago

The purpose of this thread is to gather questions about details of the Flatland environment (RailEnv). Ask here if you have any doubt about what happens at intersections, what is the precise way malfunctions occurs, etc.

### How to use TreeObsForRailEnv in remote client?

3 months ago

Hey @seungjaeryanlee,

It’s hard to say seeing only this part of the code. Could you point me to a (potentially private) repo with the full code?

I suspect this is a bug in the current pip version of Flatland which happens if a timeout occurs during the first timestep of an episode.

Are you maybe taking too long to do the first step after creating the env (timeout of 5min)? or to take the first step afterward (timeout of 5sec)?

### Dynamic grid size required?

3 months ago

Thanks for the link. Indeed during training there are multiple strategies: either focus on a single configuration at a time, or make some sort of “curriculum” to make your agent more general!

### Dynamic grid size required?

3 months ago

Hello @tim_resink,

I am curious what examples you are referring to?

The detailed configurations of the environments used for evaluation, including their dimensions, are publicly known in Round 1:
https://flatland.aicrowd.com/getting-started/environment-configurations.html

As there are 14 different configurations it would makes sense that your algorithm handles arbitrary grid sizes (at least up to 150x150)!

### 🚂 Here comes Round 1!

3 months ago

Some details about how the new timeouts work:

• During evaluation, your submission should catch the StopAsyncIteration exception when calling remote_client.env_step(action), in case the step times out. If this exception is raised, you should create a new environment by calling remote_client.env_create() before going further.

• The submission will still fully fail after 10 consecutive timeouts. This is to prevent submissions from running for 8 hours after the agent has crashed.

3 months ago

3 months ago

3 months ago

3 months ago

3 months ago

### 🚂 Here comes Round 1!

3 months ago

Thank you everyone for your participation and enthusiasm during the Warm-up Round!
We have been very impressed by the quality of the submissions so far, and by the activity around this challenge both on AIcrowd and on other platforms

Here are the changes in Round 1:

• The 400 evaluation environments will remain the same as during the Warm-up Round. However, the full specifications of these environments are now public: width, height, number of agents, malfunction interval… The only thing we are not disclosing are the seeds. This will make it easier to optimize agents to be as efficient as possible within the evaluation time limit (8 hours).

• We have made the time limits of 5 seconds per timestep less harsh. Previously, an agent that would take too long to act would cause the whole submission to fail. From now on, only the current episode will be affected: it will receive a score of -1.0 and the evaluation will proceed. The same thing will happen if you go beyond the 5 minutes time limit for initial planning. The overall 8 hours time limit, on the other hand, stays a “hard limit” that will still cause the submission to fully fail.

• Debug submissions are now limited to 48 minutes. They were previously limited to 8 hours, the same as for full submissions. The idea is that submitting in debug mode will now give you an idea whether your submission would complete a full evaluation in time or not.

Besides these changes, we are happy to release the Flatland RLlib baselines!

You will now be able to train agents using advanced methods such as Ape-X and PPO, and using many “tricks” such as action masking and action skipping. We also provide imitation learning baselines such as MARWIL and DQfD, which leverage expert demonstrations generated using last year’s top solutions to train RL agents.

RLlib allows you to scale up training to large machines or even to multiple machines. It also makes it trivial to run hyperparameter search. We are still actively working on these baselines and encourage you to take part in their development!

### RL Based Top Solution Missing?

3 months ago

Hey @student!

I am pretty sure that there was a solution tagged as ‘RL’ that made it to the top with leaderboard score < -0.1. I no longer see it. Just curious, what might have happened to it? Not that I am complaining just want to understand how high the score in pure RL based approach can go.

There was a bug in the evaluator which was allowing participants to “skip” to the next episode without finishing the current one That submission used that bug (without ill intent I believe), and as a result got a very high score (because very few penalties!), but it had very low done percentage:

The bug has been fixed and the submission re-evaluated.

Does it mean a pure multi-agent reinforcement learning approach or a hybrid approach, like mix of OR and RL (need to give it some thought on how to do it) be acceptable too?

A hybrid OR + RL approach does count as a reinforcement learning approach.

See here for more details: AI Tags - how to correctly indicate the methods you use in a submission?

4 months ago

4 months ago

### AI Tags - how to correctly indicate the methods you use in a submission?

4 months ago

Hey @AntiSquid, sorry for the delay.

Yes, a hybrid solution using reinforcement learning and some form of heuristics would still fit in the RL category. If the heuristics involve some heavy planning, it should be tagged as RL + OR, which still makes the submission applicable for the RL prizes.

Any approach which includes RL in a meaningful way will be considered for RL prizes.

The final decision will be taken by the organizers. If you are not sure if a specific method would be considered as RL or not, feel free to reach out to us using a private channel of communication with a small description of your approach.

### Some little support to C++ programmers

4 months ago

Hello @Zain,

The winner from last year used C++, you can check out his submission: https://flatland.aicrowd.com/research/top-challenge-solutions.html#first-place

We are not currently planning to make a C++ starter kit as most participants are using Python, however if many participants were to express interest that’s something we could reconsider.

### Evaluation Error

4 months ago

Hey @hyn0801, @shivam gave details about the problem directly in the issue.

### Number of test cases and video of each submission

4 months ago

Hello @junjie_li,

In the current round, 28 environments are used in debug mode and 400 for full submissions.

You can see this number in the issue corresponding to the submission eg “Simulations Complete : 400/400”

The video only shows a small subset of the environment the agents are evaluated in (typically 3 to 5 environments).

### Help; How to submit?

4 months ago

Hello @Zain!

You will probably need to get familiar with git to take part in this challenge. As a programmer, no matter which type, learning about version control is a stellar time investment! Github has some good resources to get started: https://try.github.io/

Another option is to make a team, so you could focus on your area of expertise. Post in this thread to introduce yourself and find teammates: Looking for team member?

Finally, yes you can absolutely use C++ to write your solution. The winner from last year used C++: https://flatland.aicrowd.com/research/top-challenge-solutions.html

### Communication between agents?

4 months ago

Hello @sumedh_pendurkar, yes this is possible and allowed!

### Example repo environment file error

4 months ago

Hey @tianqi_li, can you give us more details: what OS? python version?

### Warm up round eliminaton

4 months ago

Hey @hyn0801, there is no qualification between the rounds. Participants can join the challenge at any point until the final deadline.

I have updated the Overview: https://www.aicrowd.com/challenges/neurips-2020-flatland-challenge#timeline

### Getting a timeout when running the flatland-evaluator

4 months ago

Hey, what agent are you running? Is this using the default random agent from the starter kit? What are the logs on the agent’s side?

You can try flushing redis data, that may be the problem
https://flatland.aicrowd.com/getting-started/first-submission.html#env-client-step-called-before-env-client-env-create-call

### Working on the examples given (flatland-examples)

4 months ago

During evaluation, you can use remote_client.env which behaves like a normal environment. So you can access it width or height attributes as usual.

I am not sure what you mean by state size?

While evaluating using run.py the environment variables will change from what I understand according to the environment which was created for the agent to be evaluated in. How should I approach this to be able to test the example Multi-agent?

In general, you would proceed in two steps:

• First, you train your agent locally. For this you can use multi_agent_training.py, but it’s just an example, you can implement your own training method.

• Second, you submit your agent. In this challenge, no training happen during submission. Your agent needs to be fully pre-trained when you submit it (as opposed to eg the ProcGen challenge).

If you use the multi_agent_training.py, then you don’t have to care about the dimensions of the evaluation environment, because it uses tree observations. The good thing with tree observations is that the observations are always the same size, no matter the size of the environment, so you can just use a neural network with a fixed size and it’ll work in all situations!

When I run run.py with redis as a local test. I get the following error

It looks like you are giving the policy an observation from all the agents, when it expects an observation from only one of the agents.

4 months ago

Hey @compscifan2019, that doesn’t look right, we will look into it…

### Config of simulation environment during training and evaluation

4 months ago

There will be small grids in Round 1, so people can see progress even if they can’t solve the largest environments.

In Round 2, the smallest grids will be much larger, so they will potentially become problematic for pure OR approaches.

An idea could be to combine OR and RL in a smart way, eg plan with OR as much as possible during the 5min initial planning phase, then use RL for the parts you didn’t have time to fully plan and when you have malfunctions. This way you use each method for what they are best at.

### Config of simulation environment during training and evaluation

4 months ago

Yes, this is a good point. Let’s look at the big picture.

The goal of this challenge is to find efficient solutions to deal with very large environments.

For example, for 150x150 environments, operations research solutions could easily solve the problem perfectly. But they will take hours to find a solutions when the environments get larger. This is a real-world problem for logistics companies: when a train breaks down, it takes too long to find an updated schedule.

So, the goal is to find a solution which can solve environments of any size within a short computing time. We don’t necessarily want to find an optimal plan, but we want to find one that is good enough quickly! As long as you don’t have a new schedule, none of the trains can move.

So, the problems in Round 2 will be larger than in Round 1. It is also possible that we make the Round 1 environments larger at the end of the current Warm-Up Round (= at the end of the month).

Your solutions should not assume that the environments have a given maximum size, as we will make them as large as we can!

### Working on the examples given (flatland-examples)

4 months ago

Indeed you need to load the .pth file corresponding to the checkpoint you want to use.

# evaluation is faster on CPU, except if you have huge networks
parameters = {
'use_gpu': False
}

policy = DDDQNPolicy(state_size, action_size, Namespace(**parameters), evaluation_mode=True)


Then you can do policy.act(observation, eps=0.0) to get the action from your policy!

### Step by step: How I setup everything for the Flatland 2020 challenge

4 months ago

Correct! Well, the trains need to move at least from the starting point to the target, so that’s at least one timestep.

Yes there’s currently a bug in what is displayed in the issue after the evaluation is complete, we’re looking into it! you can get the correct numbers in the issue during training, and then on the leaderboard and individual submission pages.

### Config of simulation environment during training and evaluation

4 months ago

Hello @junjie_li,

The goal of this challenge is to design a policy that is able to generalize to any kind of environment. For this reason, we don’t disclose all the details about the evaluation environments.

However, you can get some details about them:

The environments vary in size and number of agents as well as malfunction parameters.

For Round 1 of the NeurIPS 2020 challenge, the upper limit of these variables for submissions are:

• (x_dim, y_dim) <= (150, 150)
• n_agents <= 400
• malfunction_rate <= 1/50

These parameters are subject to change during the challenge.

This gives you an idea of the distribution of evaluation environments you will have to solve when you do a submission.

• From the doc:

Speed profiles are not used in the first round of the NeurIPS 2020 challenge.

So you can just set all the trains to a speed of 1.0.

### Step by step: How I setup everything for the Flatland 2020 challenge

4 months ago

Generally, we refer to the whole grid world as the grid, and to each position in this grid as a “cell”.

I’ve added that episodes finish when either the max time step is reached or all train have reached their target, good catch!

### Step by step: How I setup everything for the Flatland 2020 challenge

4 months ago

So:

• The agent individually get a local score (at each step: -1 if not at target or 0 if at target) + a global score (at each step: 1 if all agents are at target 0 otherwise)
• The competition scoring is the sum of agent rewards. So indeed the global reward adds n_agents * 1 to the score, since each agent gets it
• The episode stops after all the agents have reached their destination. So effectively you only get the global reward once

### Working on the examples given (flatland-examples)

4 months ago

Hey, the best way is to start from the start kit repo: https://gitlab.aicrowd.com/flatland/neurips2020-flatland-starter-kit

Follow the getting started to see how to submit: https://flatland.aicrowd.com/getting-started/first-submission.html

Then integrate your own solution by copying over the code from flatland-examples.

You’ll have to:

• Add any dependency you need to the environment.yml file (torch…).
• Load the trained agent for your solution. In this competition, you submit pre-trained agents, no training happen on the evaluation side.
• Use your own agent in the run.py file instead of the random my_controller one used by default. Basically, call your model using the obs instead of calling randint here.

You generally don’t have to touch the run.sh file if you write your solution in Python.

### Setting up the environment on Google Colab

4 months ago

Hey, here’s a simple example:

### Conda env creation errors...UPDATED: later EOF error when running evaluator

4 months ago

That might help indeed! They also just announced GPU acceleration support!
https://blogs.windows.com/windowsdeveloper/2020/06/17/gpu-accelerated-ml-training-inside-the-windows-subsystem-for-linux/

I think rendering with Pyglet from WSL is problematic though, let us know if you find a solution around that

### Adjusting values in default config file?

4 months ago

Is this repo unfinished and i’m digging too soon into it?

Pretty much yes

But you are free to start experimenting with it anyway! The basic idea is that you point train.py to an experiment file.

So in the following example: python ./train.py -f experiments/flatland_random_sparse_small/global_obs_conv_net/ppo.yaml

In there you have num_gpus: 1 and num_workers: 7 so to run that you’ll need at least a GPU and at least 8 cores (7 workers + 1 main thread). Just tweak these values to match your hardware!

But yeah this is still mostly undocumented and very experimental so expect rough edges

### How is this challenge different from last year?

4 months ago

For Round 1 from the FAQ:

• (x_dim, y_dim) <= (150, 150)
• n_agents <= 400
• malfunction_rate <= 1/50

These parameters are subject to change during the challenge.

https://flatland.aicrowd.com/faq/challenge.html#what-are-the-evaluation-parameters

### How is this challenge different from last year?

4 months ago

Indeed 5 minutes should be enough to pre-compute a perfect path in most cases (although… don’t underestimate how large the test environments might get…)

But then trains will hit malfunctions, forcing you to recompute the routes. 5 seconds will make it harder to re-compute everything!

Finally, the timing constraints as well as the environment sizes may be adjusted from round to round. So you should design your solution taking into account that time per timestep will be scarce, and environments will be huge.

### How is this challenge different from last year?

4 months ago

The top three solutions to last year’s challenge obtained very good results, is there still a significant room for improvement?
Wello on Discord

Good question!

First, if you want to check out the top solutions from last year, they are available here:
https://flatland.aicrowd.com/research/top-challenge-solutions.html

The difference from last year is that the agents now need to act within strict time limits:

• agents have up to 5 minutes to perform initial planning (ie before performing any action)
• agents have up to 5 seconds to act per timestep (5 seconds in total for all the agents)

This comes from a real-life problem: if a train breaks down somewhere in the railway network, you need to re-schedule all the other trains as fast as possible to minimize delays.

Last year, most solutions used operations research approaches. These methods are very good at finding optimal train schedules, but the problem is that the don’t scale well to large environments: they quickly take too long to run.

This is why we are encouraging people to use reinforcement learning solutions this year, as we believe this will allow faster scheduling. The idea is that in the real world, it would be better to have a fast planning method that would provide an approximate solution, rather than having a method that can provide a perfect planning but which will take hours to calculate it.

TL;DR: This year, we added more aggressive time limits to make the problem more realistic. This will give an edge to RL solutions.

### Conda env creation errors...UPDATED: later EOF error when running evaluator

4 months ago

It did also take very long for me on Windows but eventually worked, not sure why.

### Conda env creation errors...UPDATED: later EOF error when running evaluator

4 months ago

Actually, the new Flatland release has much fewer dependencies, so you can ignore the environment.yml file.

You can simply create a new conda environment, then install Flatland with pip:
pip install flatland-rl

You can even skip conda altogether. However conda makes it easier to package your solution if you want to use specific dependencies, and you need to keep the environment.yml in your submission repository in any case.

### Checkpoint Error when Training

4 months ago

Good point, this folder is missing due to an over-eager .gitignore, you can just create it for now, I’ll push a fix for it.

Cheers

### Error while evaluation

4 months ago

Hello @manavsinghal157,

This looks like a version mismatch between

• the environment files you use (the .pkl), and
• the flatland-rl release

Which version are you using for each?

The environment files should be the latest one coming from: https://www.aicrowd.com/challenges/neurips-2020-flatland-challenge/dataset_files

The flatland-rl version should be >=2.2.0. You can check it by running:
pip list|grep flatland

Cheers

### Error in Flatland environment installation

4 months ago

As a quick-fix, try changing line 11 of setup.py to:

with open('README.md', 'r', encoding='utf8') as readme_file:

### Error in Flatland environment installation

4 months ago

Hello @rafid_abyaad, we are aware of this and a fix is on the way!

Cheers

### Start of the competition

5 months ago

Hello @RomanChernenko, you didn’t waste any time

The competition will start in the next days, stay tuned!

Cheers,
Florian

### Publishing the Solutions

4 months ago

https://flatland.aicrowd.com/research/top-challenge-solutions.html

### Setting up the environment on Google Colab

4 months ago

Let’s continue this discussion in the new category made for the NeurIPS 2020 challenge:

The current category is for last year’s challenge.

### Setting up the environment on Google Colab

4 months ago

Hello @Mnkq,

There are two problems that we’re actively working on before the challenge launches:

• the latest release of importlib-resources is causing us some problems,
• there has been a couple of breaking changes in the latest release of Flatland, and the Colab notebook hasn’t been updated yet.

We’re on it!
Cheers

### Publishing the Solutions

5 months ago

Hello @fabianpieroth,

A recording of the presentations from top participants at the AMLD conference has recently been released: https://www.youtube.com/watch?v=rGzXsOC7qXg

The winning submissions as well as exciting news about the future of this competition will be released this month!

Cheers,
Florian

### Problems running in docker

Almost 1 year ago

I want to run my training code on AWS so I can make sure everything runs fine from start to finish on a machine slower that the official one. I am using a p2.xlarge instance with the “Deep Learning AMI (Ubuntu 16.04)”.

I am trying to run the code from the repo competition_submission_starter_template, without adding my own code for now. When I run ./utility/docker_train_locally.sh, I am faced with this error:

2019-10-22 02:01:29 ip-172-30-0-174 minerl.env.malmo.instance.868e96[39] INFO Minecraft process ready
2019-10-22 02:01:29 ip-172-30-0-174 minerl.env.malmo[39] INFO Logging output of Minecraft to ./logs/mc_1.log
2019-10-22 02:01:29 ip-172-30-0-174 root[62] INFO Progress : 1
2019-10-22 02:01:29 ip-172-30-0-174 crowdai_api.events[62] DEBUG Registering crowdAI API Event : CROWDAI_EVENT_INFO register_progress {'event_type': 'minerl_challenge:register_progress', 'training_progress': 1} # with_oracle? : False
Traceback (most recent call last):
File "run.py", line 13, in <module>
train.main()
File "/home/aicrowd/train.py", line 75, in main
env.close()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/gym/core.py", line 236, in close
return self.env.close()
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/minerl/env/core.py", line 627, in close
if self.instance and self.instance.running:
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/Pyro4/core.py", line 280, in __getattr__
raise AttributeError("remote object '%s' has no exposed attribute or method '%s'" % (self._pyroUri, name))
AttributeError: remote object 'PYRO:obj_3ec8abe8c48c4b4e9dd7f7b1ac4706b1@localhost:33872' has no exposed attribute or method 'running'
Exception ignored in: <function Proxy.__del__ at 0x7f4585d4f158>
Traceback (most recent call last):
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/Pyro4/core.py", line 266, in __del__
File "/srv/conda/envs/notebook/lib/python3.7/site-packages/Pyro4/core.py", line 400, in _pyroRelease
File "/srv/conda/envs/notebook/lib/python3.7/logging/__init__.py", line 1370, in debug
File "/srv/conda/envs/notebook/lib/python3.7/logging/__init__.py", line 1626, in isEnabledFor
TypeError: 'NoneType' object is not callable
2019-10-22 02:01:30 ip-172-30-0-174 minerl.env.malmo.instance.868e96[39] DEBUG [02:01:30] [EnvServerSocketHandler/INFO]: Java has been asked to exit (code 0) by net.minecraftforge.fml.common.FMLCommonHandler.exitJava(FMLCommonHandler.java:659).


Where can I find more details? if I run ./utility/docker_run.sh --no-build to check in the container, I see no trace of logs.

Also, how would the trained model be saved in this situation? Is the the train folder mounted as a volume so that the model would be persisted outside of the container?

Finally, the expression \$(PWD) in the bash files throws error for me.

### Partially rendered env in MineRLObtainDiamondDense-v0

Just happened again, seems to be related with large bodies of water.

### Partially rendered env in MineRLObtainDiamondDense-v0

I’ve just witnessed my agent interacting in an environment which looked partially rendered, ie large pieces appeared as transparent:

This is in MineRLObtainDiamondDense-v0. I am using minerl==0.2.7.

mc_1.log output around these times:

[10:51:00] [Client thread/INFO]: [CHAT] §l804...
[10:51:00] [Client thread/ERROR]: Null returned as 'hitResult', this shouldn't happen!


I don’t see anything else suspicious in this log file. The following episodes seem to be running correctly.

### Can't train in MineRLObtainIronPickaxeDense-v0 since 0.2.7

Great, thanks for the swift fix!

### Can't train in MineRLObtainIronPickaxeDense-v0 since 0.2.7

I just updated to 0.2.7, when trying to train in MineRLObtainIronPickaxeDense-v0 I now get the following errors:

ERROR    - 2019-10-18 04:52:00,768 - [minerl.env.malmo.instance.2edcf5 log_to_file 535] [04:52:00] [EnvServerSocketHandler/INFO]: [STDOUT]: REPLYING WITH: MALMOERRORcvc-complex-type.3.2.2: Attribute 'avoidLoops' is not allowed to appear in element 'RewardForPossessingItem'.
ERROR    - 2019-10-18 04:52:01,867 - [minerl.env.malmo.instance.2edcf5 log_to_file 535] [04:52:01] [EnvServerSocketHandler/INFO]: [STDOUT]: REPLYING WITH: MALMOERRORcvc-complex-type.3.2.2: Attribute 'avoidLoops' is not allowed to appear in element 'RewardForPossessingItem'.
ERROR    - 2019-10-18 04:52:02,950 - [minerl.env.malmo.instance.2edcf5 log_to_file 535] [04:52:02] [EnvServerSocketHandler/INFO]: [STDOUT]: REPLYING WITH: MALMOERRORcvc-complex-type.3.2.2: Attribute 'avoidLoops' is not allowed to appear in element 'RewardForPossessingItem'.
...


This environment was working fine before, but I was using the package version from before the reward loop was fixed, so maybe this problem was already present since 0.2.5.

### Tutorial Deep Reinforcement Learning to try with PyTorch

Over 1 year ago

Incremental PyTorch implementations of main algos:
RL-Adventure DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL
RL-Adventure-2 actor critic / proximal policy optimization / acer / ddpg / twin dueling ddpg / soft actor critic / generative adversarial imitation learning / HER

Good implementations of A2C/PPO/ACKTR: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr

BTW The repo for the Udacity course is open source: https://github.com/udacity/deep-reinforcement-learning

MasterScrat has not provided any information yet.