Loading
Feedback

MasterScrat

Name

Florian Laurent

Location

CH

Badges

0
1
2

Activity

Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Mon
Wed
Fri

Ratings Progression

Loading...

Challenge Categories

Loading...

Challenges Entered

Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments

Latest submissions

See All
graded 67944
graded 67942

Multi Agent Reinforcement Learning on Trains

Latest submissions

See All
failed 74631
failed 74630
failed 74627

Sample-efficient reinforcement learning in Minecraft

Latest submissions

See All
graded 25410
graded 25409
failed 25399

Multi Agent Reinforcement Learning on Trains.

Latest submissions

See All
failed 67801
failed 67786
failed 67761
Gold 0
Silver 1
Boltzmann's Favourite
May 16, 2020
Bronze 2
Trustable
May 16, 2020
Newtonian
May 16, 2020

Badges


  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020
  • Has filled their profile page
    May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020
  • Kudos! You've won a bronze badge in this challenge. Keep up the great work!
    Challenge: NeurIPS 2019 : MineRL Competition
    May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020

  • May 16, 2020
  • Kudos! You've been awarded a silver badge for this challenge. Keep up the great work!
    Challenge: droneRL
    May 16, 2020
Participant Rating
shivam
Participant Rating
anssi 225
junjie_li 265

NeurIPS 2020: Flatland Challenge

Question about submission confidentiality

2 days ago

Hello @harshadkhadilkar,

Yes, you can keep your submission “close”, in which case the organizers will review your code, but you won’t have to make it public. But in that case, you won’t be eligible for prizes. You will still keep your rank in the leaderboard, but your prize will be given to the next best team (and their price will be given to the following best team, etc).

🚑 Addressing Round 1 pain points

6 days ago

@junjie_li I have edited the original message yesterday, it may not have been visible enough, sorry about that! We have extended Round 1 by one week:

🚑 Addressing Round 1 pain points

6 days ago

Since the problem with debug submissions counting as much as full submissions is still awaiting a fix, we have updated the submission quota to 7 per day until the end of the round.

Team merging deadline

13 days ago

Hey @kirill_ershov, yes we will announce a team merging deadline when we start Round 2.

🚑 Addressing Round 1 pain points

13 days ago

Hey @junjie_li,

Quoting from your post here 🧞 Pain points in Round 1 and wishes for Round 2? :

My wishes for Round 2 are:

  • Use only a few large test cases(for example, # of test cases <= 10), while keep same overall running time. It may be even better to test with same grid size.
  • Use same speed for different agents. I personally prefer to focus more on RL related things, instead of dealing with dead-lock from different speeds.

I think one of OR’s shortage is that it’s not straightforward to optimize for global reward.
My understanding: RL’s advantage is finding a better solution(combining with OR), but not acting in a shorter time.
If we want to see RL performan better than OR, we should give RL enough time for planning/inference on large grid env. (both 5 min and 5s may not be enough for RL to do planning and inference. )

I think I understand your point of view. Indeed, by focusing on a few large environments with RL, the global reward could be better than using OR, as RL can explicitly optimize the reward.
Did I understand your point correctly?

However, the business problem is different. In the real world, OR methods are already very good at finding optimal solutions. The problem is that they take too long to calculate these solutions, and the calculation time explodes with the size of the railway network. This is especially a problem when a train breaks down: people are waiting, so a solution should really be found as fast as possible, even if it’s not completely optimal.

This is why we are introducing this “time vs score” trade-off: in practice it may be more useful to have a sub-optimal solution that allows the trains to start moving after a few minutes of calculations, rather than having to wait an hour before finding a perfect solution. Similarly in Round 2 your solution can be faster but come up with solutions which are not perfect, but still potentially accumulate more points.

We are hoping that RL can help move the needle here, as the agents could potentially keep moving without having to calculate a full planning until the end, therefore finding an approximate solution faster!

🚑 Addressing Round 1 pain points

14 days ago

Thanks you everyone for your feedback on Round 1! Here’s a summary of the problems encountered so far, and how we plan to address them.

:memo:TL:DR: Round 2 will be similar to Round 1 but with many more environments. The 8 hours overall time limit won’t cause submissions to fail anymore. Prizes will be announced soon. Reported bugs are being fixed. Round 2 is pushed back by one week while we address all the feedback.

:memo:EDIT: We are still hard at work addressing issues from Round 1 and preparing Round 2. To make sure everything goes well when we start the next round, we are pushing Round 2 back by an extra week (to August 14th).

The 8 hours overall time limits is too strict! :timer_clock:
This is the most common problem: it’s very hard to get an RL solution to finish in time.

To fix this, we will make this time limit a “soft timeout”: if your submission takes more than 8 hours, it won’t be cancelled anymore, but instead all the remaining episodes that is didn’t have time to solve will receive a score of -1.0.

To make this process fair, the order of the evaluation environments will be fixed. The environments will also be ordered in increasing order of size.

The environment is too slow :snail:
The Flatland environment does get slow when running larger environments!

This is a problem in two situations. First, for submissions: in this case it could push solutions over the 8 hours overall time limit. Now that this time limit will be “soft”, this won’t be such a big problem anymore. Yes, the environment will still take a large chunk of the time during the evaluation process. But your submission will be valid even if it takes too long, and the environment takes the same amount of time for all participants, so things are fair.

Still, the speed of the environment limits how fast you can train new agents and experiment with new ideas. We will release a new version that includes a number of performance improvements to alleviate this issue for Round 2.

I don’t want people to see videos of my submissions :see_no_evil:
Some participants have expressed the wish to hide their submissions videos.

This is not something we plan to provide. Our goal is to foster open and transparent competition, and showing videos is part of the game: participants can glean some information from them to get new ideas.

One strategy would be to wait for the last minute to “hide your hand”. This is possible, but can be risky, as the number of submissions per day is limited, so it is generally better to secure a good position on the leaderboard as soon as possible!

We still don’t know what the prizes will be! :gift:
The original prizes were travel grants to NeurIPS - but sadly the conference will be fully virtual this year.

This forced us to look again for new sponsors for the prizes. While we can’t announce anything yet, things are progressing, and we’re hoping to announce exciting prizes by the time Round 2 starts.

The margin of progression for OR is too small 💇
OR solutions reached 100% of completion rate in a matter of days in Round 1, and are now fighting over thousandth of points. Since the overall time limit is now “soft”, we will simply add many more evaluation episodes including much larger environments to allow a larger margin of progression for all solutions.

Documentation is still lacking :books:
Flatland is a complex project that has been developed by dozens of people over the last few years. We have invested a lot of energy to gather all the relevant information at flatland.aicrowd.com, but we realise there is still a lot of work ahead.

We will keep working on this, but this is a large task where your contribution is more than welcome. Contributing to the documentation would make you an official Flatland Contributor! Check out https://flatland.aicrowd.com/misc/contributing.html to see how you can help.

Various bugs are making our lives harder :bug:
Here’s a list of known bugs we plan to squash before Round 2 starts:

  • Debug submissions count the same a full submissions :scream:

  • When a submission is done, the percentages and other metrics reported in the Gitlab issues are non-sensical (“-11.36% of agents done")

  • Rendering bug showing agents in places where there shouldn’t be

We’re hard at work to address all these issues. We have moved the starting date of Round 2 one week back to give us time to implement and deploy all the necessary changes.

We’re still open to comments, complaints and requests! Please fill up the survey if you haven’t done so:

I'm getting "git@gitlab.aicrowd.com: Permission denied (publickey)"

15 days ago

Great! I’ll edit the documentation to clarify this point.

I'm getting "git@gitlab.aicrowd.com: Permission denied (publickey)"

15 days ago

Oh, you need to add the key to gitlab.aicrowd.com, not to gitlab.com! gitlab.aicrowd.com is our own instance of Gitlab.

So, in the instructions, you should replace “gitlab.com” with “gitlab.aicrowd.com”.

🧞 Pain points in Round 1 and wishes for Round 2?

16 days ago

Thank you @akleban and @junjie_li for your answers!

Regarding the 8 hour time limit, would it solve the issue if this time limit would not cancel the submission when it takes too long, but would instead give a score of -1.0 to all the environments that have not been solved in time?

Did you have problems with the 5 min and 5 seconds time limits? What do you think would be reasonable time limits to use instead?

@junjie_li I understand that these two points are making things harder:

  • large variety of environments
  • potentially different train speeds in Round 2

However, these are part of the business problem SBB and DeutscheBahn are facing and that we are trying to solve. We need to strike a balance between making the challenge feasible/interesting, and keeping it close enough to the real-world problem so results are useful!

🧞 Pain points in Round 1 and wishes for Round 2?

16 days ago

About this:

The trick is to use a dummy observation builder, which takes no time, and to build the observations by calling the actual observation builder yourself when needed by calling observation_builder.get_many()

How to use TreeObsForRailEnv in remote client?

16 days ago

It is still not clear to me what this error is about, we could look more into it if we had a code sample or link to a gitlab issue where it occurs.

Timeout in submission

17 days ago

Hey @jiaxun_cui, this is not currently possible.

Question about round 1 -> round 2

17 days ago

This is correct!

The goal of Round 1 is to fine-tune the problem definition. Only Round 2 will matter for the prize.

🧞 Pain points in Round 1 and wishes for Round 2?

19 days ago

With 7 days to go in Round 1, what have been the major pain points so far? What would you want to see improved in Round 2?

Edit: fill up the survey to help us understand what we can improve!

Timeout in submission

19 days ago

Hello @antoinep, indeed the environment is slow which is a problem for many submissions, especially the RL ones.

We are working on different solutions and will make sure this is handled better in Round 2.

For now, the most efficient solution would be to “pick your battles”. If your solution is too slow to solve all 400 episodes, you can chose to only solve some of them.

While there’s no way to “skip” episodes, what you can do is perform “no-ops” during some of the episodes. If you perform steps with no actions for the whole episode (ie env.step({})), you will very quickly reach the end of that episode. Of course you will get a score of -1.0 for this episode, but this will allow you to finish the evaluation in time.

For example, you could start by only using your RL policy for environments with 50 agents or less (you can see the environment configurations here). For all other environments, you just perform no-ops until they’re over. If you see your solution is fast enough this way, then you can tackle more environments eg up to 80 agents.

There are other ways to speed up your policy, eg running the inference in parallel, keeping a cache of {state -> action} etc, but skipping some episodes will let you make a successful submission more easily in any case.

🚉 Questions about the Flatland Environment

19 days ago

That’s weird! How are you parallelising it? We use dozens of environments in parallel in the RLlib baselines: https://flatland.aicrowd.com/research/baselines.html

🚉 Questions about the Flatland Environment

20 days ago

Hey @shining_spring,

Indeed, malfunction_duration = [20,50] specifies the min/max of the malfunction_duration. This value is the same for all Round 1 environments.

min_malfunction_interval is the minimal interval between malfunctions.

The malfunction_rate is the invert of the malfunction interval. So the malfunction rate will be at most 1.0 / min_malfunction_interval.

Baseline for other algorithms

21 days ago

I’m not familiar with these methods. Beyond the RL baselines, one OR method has been documented: https://flatland.aicrowd.com/getting-started/or.html

And then there are also the top solutions from last year: https://flatland.aicrowd.com/research/top-challenge-solutions.html

🚉 Questions about the Flatland Environment

22 days ago

@seungjae_ryan_lee yes remove_agents_at_target is True during evaluation!

Get n_city from RailEnv

26 days ago

Hey @kirill_ershov, no, annoyingly you can’t get that number from the environment with the current version.

This is an open bug in Flatland: https://gitlab.aicrowd.com/flatland/flatland/issues/324

A dirty way may be to use the formula for the max number of timesteps?

# flatland/envs/schedule_generators.py:174
timedelay_factor = 4
alpha = 2
max_episode_steps = int(timedelay_factor * alpha * (rail.width + rail.height + num_agents / len(city_positions)))

You know the values of rail.width, rail.height and num_agents so you could recover len(city_positions) :grimacing:

Optimization opportunities in the Flatland environment

About 1 month ago

Here are some potential optimizations in the Flatland environment discovered by Adrian Egli from SBB. They will eventually be integrated in the Flatland codebase, but you are already welcome to take advantage of them.

If you do test and integrate them, you are encouraged to submit PRs to the Flatland repository, which would make you a Flatland contributor!

#---- SpedUp ~7x -----------------------------------------------------------------------------------------------------
#ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#109161    0.131    0.000    0.131    0.000 grid4_utils.py:29(get_new_position)
MOVEMENT_ARRAY = [(-1, 0), (0, 1), (1, 0), (0, -1)]
def get_new_position(position, movement):
	return (position[0] + MOVEMENT_ARRAY[movement][0], position[1] + MOVEMENT_ARRAY[movement][1])
#---- ORIGINAL -----------------------------------------------------------------------------------------------------
#ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#112703    0.893    0.000    1.355    0.000 grid4_utils.py:32(get_new_position)
def get_new_position(position, movement):
     """ Utility function that converts a compass movement over a 2D grid to new positions (r, c). """
    if movement == Grid4TransitionsEnum.NORTH:
        return (position[0] - 1, position[1])
    elif movement == Grid4TransitionsEnum.EAST:
        return (position[0], position[1] + 1)
    elif movement == Grid4TransitionsEnum.SOUTH:
        return (position[0] + 1, position[1])
    elif movement == Grid4TransitionsEnum.WEST:
        return (position[0], position[1] - 1)
#---- SpeedUp ~3x ...............................................................
#ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#27121    0.041    0.000    0.273    0.000 grid4.py:66(get_transitions)
from numba import njit,jit

def get_transitions(self,cell_transition, orientation):
	return opt_get_transitions(cell_transition,orientation)

@jit()
def opt_get_transitions(cell_transition, orientation):
    """
    Get the 4 possible transitions ((N,E,S,W), 4 elements tuple
    if no diagonal transitions allowed) available for an agent oriented
    in direction `orientation` and inside a cell with
    transitions `cell_transition`.
    Parameters
    ----------
    cell_transition : int
        16 bits used to encode the valid transitions for a cell.
    orientation : int
        Orientation of the agent inside the cell.
    Returns
    -------
    tuple
        List of the validity of transitions in the cell.
    """
    bits = (cell_transition >> ((3 - orientation) * 4))
    return ((bits >> 3) & 1, (bits >> 2) & 1, (bits >> 1) & 1, (bits) & 1)

#---- ORIGINAL -----------------------------------------------------------------------------------------------------
#ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#25399    0.146    0.000    0.146    0.000 grid4.py:66(get_transitions)
def opt_get_transitions(self, cell_transition, orientation):

I think we could use numba to increase the performance. Especially for all pure numpy and python methods which can be made “static”.

🚉 Questions about the Flatland Environment

About 1 month ago

The purpose of this thread is to gather questions about details of the Flatland environment (RailEnv). Ask here if you have any doubt about what happens at intersections, what is the precise way malfunctions occurs, etc.

How to use TreeObsForRailEnv in remote client?

About 1 month ago

Hey @seungjaeryanlee,

It’s hard to say seeing only this part of the code. Could you point me to a (potentially private) repo with the full code?

I suspect this is a bug in the current pip version of Flatland which happens if a timeout occurs during the first timestep of an episode.

Are you maybe taking too long to do the first step after creating the env (timeout of 5min)? or to take the first step afterward (timeout of 5sec)?

Dynamic grid size required?

About 1 month ago

Thanks for the link. Indeed during training there are multiple strategies: either focus on a single configuration at a time, or make some sort of “curriculum” to make your agent more general!

Dynamic grid size required?

About 1 month ago

Hello @tim_resink,

I am curious what examples you are referring to?

The detailed configurations of the environments used for evaluation, including their dimensions, are publicly known in Round 1:
https://flatland.aicrowd.com/getting-started/environment-configurations.html

As there are 14 different configurations it would makes sense that your algorithm handles arbitrary grid sizes (at least up to 150x150)!

🚂 Here comes Round 1!

About 1 month ago

Some details about how the new timeouts work:

  • During evaluation, your submission should catch the StopAsyncIteration exception when calling remote_client.env_step(action), in case the step times out. If this exception is raised, you should create a new environment by calling remote_client.env_create() before going further.

  • The submission will still fully fail after 10 consecutive timeouts. This is to prevent submissions from running for 8 hours after the agent has crashed.

🚂 Here comes Round 1!

About 1 month ago

🚂 Here comes Round 1!

About 1 month ago

🚂 Here comes Round 1!

About 1 month ago

🚂 Here comes Round 1!

About 1 month ago

🚂 Here comes Round 1!

About 1 month ago

Thank you everyone for your participation and enthusiasm during the Warm-up Round!
We have been very impressed by the quality of the submissions so far, and by the activity around this challenge both on AIcrowd and on other platforms :star_struck:

Here are the changes in Round 1:

  • The 400 evaluation environments will remain the same as during the Warm-up Round. However, the full specifications of these environments are now public: width, height, number of agents, malfunction interval… The only thing we are not disclosing are the seeds. This will make it easier to optimize agents to be as efficient as possible within the evaluation time limit (8 hours).

  • We have made the time limits of 5 seconds per timestep less harsh. Previously, an agent that would take too long to act would cause the whole submission to fail. From now on, only the current episode will be affected: it will receive a score of -1.0 and the evaluation will proceed. The same thing will happen if you go beyond the 5 minutes time limit for initial planning. The overall 8 hours time limit, on the other hand, stays a “hard limit” that will still cause the submission to fully fail.

  • Debug submissions are now limited to 48 minutes. They were previously limited to 8 hours, the same as for full submissions. The idea is that submitting in debug mode will now give you an idea whether your submission would complete a full evaluation in time or not.

Besides these changes, we are happy to release the Flatland RLlib baselines!

:blue_book:Doc: https://flatland.aicrowd.com/research/baselines.html
:card_index:Repo: https://gitlab.aicrowd.com/flatland/neurips2020-flatland-baselines

You will now be able to train agents using advanced methods such as Ape-X and PPO, and using many “tricks” such as action masking and action skipping. We also provide imitation learning baselines such as MARWIL and DQfD, which leverage expert demonstrations generated using last year’s top solutions to train RL agents.

RLlib allows you to scale up training to large machines or even to multiple machines. It also makes it trivial to run hyperparameter search. We are still actively working on these baselines and encourage you to take part in their development! :toolbox::wrench:

RL Based Top Solution Missing?

About 1 month ago

Hey @student!

I am pretty sure that there was a solution tagged as ‘RL’ that made it to the top with leaderboard score < -0.1. I no longer see it. Just curious, what might have happened to it? Not that I am complaining :slight_smile: just want to understand how high the score in pure RL based approach can go.

There was a bug in the evaluator which was allowing participants to “skip” to the next episode without finishing the current one :sweat_smile: That submission used that bug (without ill intent I believe), and as a result got a very high score (because very few penalties!), but it had very low done percentage:

The bug has been fixed and the submission re-evaluated.

Does it mean a pure multi-agent reinforcement learning approach or a hybrid approach, like mix of OR and RL (need to give it some thought on how to do it) be acceptable too?

A hybrid OR + RL approach does count as a reinforcement learning approach.

See here for more details: AI Tags - how to correctly indicate the methods you use in a submission?

AI Tags - how to correctly indicate the methods you use in a submission?

About 1 month ago

Hey @AntiSquid, sorry for the delay.

Yes, a hybrid solution using reinforcement learning and some form of heuristics would still fit in the RL category. If the heuristics involve some heavy planning, it should be tagged as RL + OR, which still makes the submission applicable for the RL prizes.

Any approach which includes RL in a meaningful way will be considered for RL prizes.

The final decision will be taken by the organizers. If you are not sure if a specific method would be considered as RL or not, feel free to reach out to us using a private channel of communication with a small description of your approach.

Some little support to C++ programmers

About 1 month ago

Hello @Zain,

The winner from last year used C++, you can check out his submission: https://flatland.aicrowd.com/research/top-challenge-solutions.html#first-place

We are not currently planning to make a C++ starter kit as most participants are using Python, however if many participants were to express interest that’s something we could reconsider.

Evaluation Error

About 1 month ago

Hey @hyn0801, @shivam gave details about the problem directly in the issue.

Number of test cases and video of each submission

About 1 month ago

Hello @junjie_li,

In the current round, 28 environments are used in debug mode and 400 for full submissions.

You can see this number in the issue corresponding to the submission eg “Simulations Complete : 400/400”

The video only shows a small subset of the environment the agents are evaluated in (typically 3 to 5 environments).

Help; How to submit?

About 1 month ago

Hello @Zain!

You will probably need to get familiar with git to take part in this challenge. As a programmer, no matter which type, learning about version control is a stellar time investment! Github has some good resources to get started: https://try.github.io/

Another option is to make a team, so you could focus on your area of expertise. Post in this thread to introduce yourself and find teammates: Looking for team member?

Finally, yes you can absolutely use C++ to write your solution. The winner from last year used C++: https://flatland.aicrowd.com/research/top-challenge-solutions.html

Communication between agents?

About 1 month ago

Hello @sumedh_pendurkar, yes this is possible and allowed!

Example repo environment file error

About 1 month ago

Hey @tianqi_li, can you give us more details: what OS? python version?

Warm up round eliminaton

About 2 months ago

Hey @hyn0801, there is no qualification between the rounds. Participants can join the challenge at any point until the final deadline.

I have updated the Overview: https://www.aicrowd.com/challenges/neurips-2020-flatland-challenge#timeline

Getting a timeout when running the flatland-evaluator

About 2 months ago

Hey, what agent are you running? Is this using the default random agent from the starter kit? What are the logs on the agent’s side?

You can try flushing redis data, that may be the problem :thinking:
https://flatland.aicrowd.com/getting-started/first-submission.html#env-client-step-called-before-env-client-env-create-call

Working on the examples given (flatland-examples)

About 2 months ago

During evaluation, you can use remote_client.env which behaves like a normal environment. So you can access it width or height attributes as usual.

I am not sure what you mean by state size?

While evaluating using run.py the environment variables will change from what I understand according to the environment which was created for the agent to be evaluated in. How should I approach this to be able to test the example Multi-agent?

In general, you would proceed in two steps:

  • First, you train your agent locally. For this you can use multi_agent_training.py, but it’s just an example, you can implement your own training method.

  • Second, you submit your agent. In this challenge, no training happen during submission. Your agent needs to be fully pre-trained when you submit it (as opposed to eg the ProcGen challenge).

If you use the multi_agent_training.py, then you don’t have to care about the dimensions of the evaluation environment, because it uses tree observations. The good thing with tree observations is that the observations are always the same size, no matter the size of the environment, so you can just use a neural network with a fixed size and it’ll work in all situations!

When I run run.py with redis as a local test. I get the following error

It looks like you are giving the policy an observation from all the agents, when it expects an observation from only one of the agents.

Flatland challenge website "My Team" button leading to wrong link

About 2 months ago

Hey @compscifan2019, that doesn’t look right, we will look into it…

Config of simulation environment during training and evaluation

About 2 months ago

There will be small grids in Round 1, so people can see progress even if they can’t solve the largest environments.

In Round 2, the smallest grids will be much larger, so they will potentially become problematic for pure OR approaches.

An idea could be to combine OR and RL in a smart way, eg plan with OR as much as possible during the 5min initial planning phase, then use RL for the parts you didn’t have time to fully plan and when you have malfunctions. This way you use each method for what they are best at.

Config of simulation environment during training and evaluation

About 2 months ago

Yes, this is a good point. Let’s look at the big picture.

The goal of this challenge is to find efficient solutions to deal with very large environments.

For example, for 150x150 environments, operations research solutions could easily solve the problem perfectly. But they will take hours to find a solutions when the environments get larger. This is a real-world problem for logistics companies: when a train breaks down, it takes too long to find an updated schedule.

So, the goal is to find a solution which can solve environments of any size within a short computing time. We don’t necessarily want to find an optimal plan, but we want to find one that is good enough quickly! As long as you don’t have a new schedule, none of the trains can move.

So, the problems in Round 2 will be larger than in Round 1. It is also possible that we make the Round 1 environments larger at the end of the current Warm-Up Round (= at the end of the month).

Your solutions should not assume that the environments have a given maximum size, as we will make them as large as we can!

Working on the examples given (flatland-examples)

About 2 months ago

Indeed you need to load the .pth file corresponding to the checkpoint you want to use.

You can see an example of loading a checkpoint here: https://gitlab.aicrowd.com/flatland/flatland-examples/blob/master/reinforcement_learning/evaluate_agent.py#L28

# evaluation is faster on CPU, except if you have huge networks
parameters = {
    'use_gpu': False
}

policy = DDDQNPolicy(state_size, action_size, Namespace(**parameters), evaluation_mode=True)
policy.qnetwork_local = torch.load(checkpoint)

Then you can do policy.act(observation, eps=0.0) to get the action from your policy!

Step by step: How I setup everything for the Flatland 2020 challenge

About 2 months ago

Correct! Well, the trains need to move at least from the starting point to the target, so that’s at least one timestep.

Yes there’s currently a bug in what is displayed in the issue after the evaluation is complete, we’re looking into it! you can get the correct numbers in the issue during training, and then on the leaderboard and individual submission pages.

Config of simulation environment during training and evaluation

About 2 months ago

Hello @junjie_li,

The goal of this challenge is to design a policy that is able to generalize to any kind of environment. For this reason, we don’t disclose all the details about the evaluation environments.

However, you can get some details about them:

The environments vary in size and number of agents as well as malfunction parameters.

For Round 1 of the NeurIPS 2020 challenge, the upper limit of these variables for submissions are:

  • (x_dim, y_dim) <= (150, 150)
  • n_agents <= 400
  • malfunction_rate <= 1/50

These parameters are subject to change during the challenge.

This gives you an idea of the distribution of evaluation environments you will have to solve when you do a submission.

  • From the doc:

Speed profiles are not used in the first round of the NeurIPS 2020 challenge.

So you can just set all the trains to a speed of 1.0.

Step by step: How I setup everything for the Flatland 2020 challenge

About 2 months ago

Generally, we refer to the whole grid world as the grid, and to each position in this grid as a “cell”.

I’ve added that episodes finish when either the max time step is reached or all train have reached their target, good catch!

Step by step: How I setup everything for the Flatland 2020 challenge

About 2 months ago

Thanks for your detailed review :smiley:

So:

  • The agent individually get a local score (at each step: -1 if not at target or 0 if at target) + a global score (at each step: 1 if all agents are at target 0 otherwise)
  • The competition scoring is the sum of agent rewards. So indeed the global reward adds n_agents * 1 to the score, since each agent gets it
  • The episode stops after all the agents have reached their destination. So effectively you only get the global reward once

Working on the examples given (flatland-examples)

About 2 months ago

Hey, the best way is to start from the start kit repo: https://gitlab.aicrowd.com/flatland/neurips2020-flatland-starter-kit

Follow the getting started to see how to submit: https://flatland.aicrowd.com/getting-started/first-submission.html

Then integrate your own solution by copying over the code from flatland-examples.

You’ll have to:

  • Add any dependency you need to the environment.yml file (torch…).
  • Load the trained agent for your solution. In this competition, you submit pre-trained agents, no training happen on the evaluation side.
  • Use your own agent in the run.py file instead of the random my_controller one used by default. Basically, call your model using the obs instead of calling randint here.

You generally don’t have to touch the run.sh file if you write your solution in Python.

Conda env creation errors...UPDATED: later EOF error when running evaluator

About 2 months ago

That might help indeed! They also just announced GPU acceleration support!
https://blogs.windows.com/windowsdeveloper/2020/06/17/gpu-accelerated-ml-training-inside-the-windows-subsystem-for-linux/

I think rendering with Pyglet from WSL is problematic though, let us know if you find a solution around that

Adjusting values in default config file?

About 2 months ago

Is this repo unfinished and i’m digging too soon into it?

Pretty much yes :wink:

But you are free to start experimenting with it anyway! The basic idea is that you point train.py to an experiment file.

So in the following example: python ./train.py -f experiments/flatland_random_sparse_small/global_obs_conv_net/ppo.yaml

you are using this file: https://gitlab.aicrowd.com/flatland/neurips2020-flatland-baselines/blob/master/experiments/flatland_random_sparse_small/global_obs_conv_net/ppo.yaml

In there you have num_gpus: 1 and num_workers: 7 so to run that you’ll need at least a GPU and at least 8 cores (7 workers + 1 main thread). Just tweak these values to match your hardware!

But yeah this is still mostly undocumented and very experimental so expect rough edges :skull_and_crossbones::zap:

How is this challenge different from last year?

About 2 months ago

For Round 1 from the FAQ:

  • (x_dim, y_dim) <= (150, 150)
  • n_agents <= 400
  • malfunction_rate <= 1/50

These parameters are subject to change during the challenge.

https://flatland.aicrowd.com/faq/challenge.html#what-are-the-evaluation-parameters

How is this challenge different from last year?

About 2 months ago

Indeed 5 minutes should be enough to pre-compute a perfect path in most cases (although… don’t underestimate how large the test environments might get…)

But then trains will hit malfunctions, forcing you to recompute the routes. 5 seconds will make it harder to re-compute everything!

Finally, the timing constraints as well as the environment sizes may be adjusted from round to round. So you should design your solution taking into account that time per timestep will be scarce, and environments will be huge.

How is this challenge different from last year?

About 2 months ago

The top three solutions to last year’s challenge obtained very good results, is there still a significant room for improvement?
Wello on Discord

Good question!

First, if you want to check out the top solutions from last year, they are available here:
https://flatland.aicrowd.com/research/top-challenge-solutions.html

The difference from last year is that the agents now need to act within strict time limits:

  • agents have up to 5 minutes to perform initial planning (ie before performing any action)
  • agents have up to 5 seconds to act per timestep (5 seconds in total for all the agents)

This comes from a real-life problem: if a train breaks down somewhere in the railway network, you need to re-schedule all the other trains as fast as possible to minimize delays.

Last year, most solutions used operations research approaches. These methods are very good at finding optimal train schedules, but the problem is that the don’t scale well to large environments: they quickly take too long to run.

This is why we are encouraging people to use reinforcement learning solutions this year, as we believe this will allow faster scheduling. The idea is that in the real world, it would be better to have a fast planning method that would provide an approximate solution, rather than having a method that can provide a perfect planning but which will take hours to calculate it.

TL;DR: This year, we added more aggressive time limits to make the problem more realistic. This will give an edge to RL solutions.

Conda env creation errors...UPDATED: later EOF error when running evaluator

About 2 months ago

It did also take very long for me on Windows but eventually worked, not sure why.

Conda env creation errors...UPDATED: later EOF error when running evaluator

About 2 months ago

Actually, the new Flatland release has much fewer dependencies, so you can ignore the environment.yml file.

You can simply create a new conda environment, then install Flatland with pip:
pip install flatland-rl

You can even skip conda altogether. However conda makes it easier to package your solution if you want to use specific dependencies, and you need to keep the environment.yml in your submission repository in any case.

Checkpoint Error when Training

2 months ago

Good point, this folder is missing due to an over-eager .gitignore, you can just create it for now, I’ll push a fix for it.

Cheers

Error while evaluation

2 months ago

Hello @manavsinghal157,

This looks like a version mismatch between

  • the environment files you use (the .pkl), and
  • the flatland-rl release

Which version are you using for each?

The environment files should be the latest one coming from: https://www.aicrowd.com/challenges/neurips-2020-flatland-challenge/dataset_files

The flatland-rl version should be >=2.2.0. You can check it by running:
pip list|grep flatland

Cheers

Error in Flatland environment installation

2 months ago

As a quick-fix, try changing line 11 of setup.py to:

with open('README.md', 'r', encoding='utf8') as readme_file:

Error in Flatland environment installation

2 months ago

Hello @rafid_abyaad, we are aware of this and a fix is on the way!

Cheers

Start of the competition

2 months ago

Hello @RomanChernenko, you didn’t waste any time :smiley:

The competition will start in the next days, stay tuned!

Cheers,
Florian

Flatland Challenge

Publishing the Solutions

About 2 months ago

The top 3 winning solutions are now available from this page:
https://flatland.aicrowd.com/research/top-challenge-solutions.html

Setting up the environment on Google Colab

2 months ago

Let’s continue this discussion in the new category made for the NeurIPS 2020 challenge:

The current category is for last year’s challenge.

Setting up the environment on Google Colab

2 months ago

Hello @Mnkq,

There are two problems that we’re actively working on before the challenge launches:

  • the latest release of importlib-resources is causing us some problems,
  • there has been a couple of breaking changes in the latest release of Flatland, and the Colab notebook hasn’t been updated yet.

We’re on it!
Cheers

Publishing the Solutions

3 months ago

Hello @fabianpieroth,

A recording of the presentations from top participants at the AMLD conference has recently been released: https://www.youtube.com/watch?v=rGzXsOC7qXg

The winning submissions as well as exciting news about the future of this competition will be released this month!

Cheers,
Florian

NeurIPS 2019 : MineRL Competition

Problems running in docker

10 months ago

I want to run my training code on AWS so I can make sure everything runs fine from start to finish on a machine slower that the official one. I am using a p2.xlarge instance with the “Deep Learning AMI (Ubuntu 16.04)”.

I am trying to run the code from the repo competition_submission_starter_template, without adding my own code for now. When I run ./utility/docker_train_locally.sh, I am faced with this error:

2019-10-22 02:01:29 ip-172-30-0-174 minerl.env.malmo.instance.868e96[39] INFO Minecraft process ready
2019-10-22 02:01:29 ip-172-30-0-174 minerl.env.malmo[39] INFO Logging output of Minecraft to ./logs/mc_1.log
2019-10-22 02:01:29 ip-172-30-0-174 root[62] INFO Progress : 1
2019-10-22 02:01:29 ip-172-30-0-174 crowdai_api.events[62] DEBUG Registering crowdAI API Event : CROWDAI_EVENT_INFO register_progress {'event_type': 'minerl_challenge:register_progress', 'training_progress': 1} # with_oracle? : False
Traceback (most recent call last):
  File "run.py", line 13, in <module>
    train.main()
  File "/home/aicrowd/train.py", line 75, in main
    env.close()
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/gym/core.py", line 236, in close
    return self.env.close()
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/minerl/env/core.py", line 627, in close
    if self.instance and self.instance.running:
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/Pyro4/core.py", line 280, in __getattr__
    raise AttributeError("remote object '%s' has no exposed attribute or method '%s'" % (self._pyroUri, name))
AttributeError: remote object 'PYRO:obj_3ec8abe8c48c4b4e9dd7f7b1ac4706b1@localhost:33872' has no exposed attribute or method 'running'
Exception ignored in: <function Proxy.__del__ at 0x7f4585d4f158>
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/Pyro4/core.py", line 266, in __del__
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/Pyro4/core.py", line 400, in _pyroRelease
  File "/srv/conda/envs/notebook/lib/python3.7/logging/__init__.py", line 1370, in debug
  File "/srv/conda/envs/notebook/lib/python3.7/logging/__init__.py", line 1626, in isEnabledFor
TypeError: 'NoneType' object is not callable
2019-10-22 02:01:30 ip-172-30-0-174 minerl.env.malmo.instance.868e96[39] DEBUG [02:01:30] [EnvServerSocketHandler/INFO]: Java has been asked to exit (code 0) by net.minecraftforge.fml.common.FMLCommonHandler.exitJava(FMLCommonHandler.java:659).

Where can I find more details? if I run ./utility/docker_run.sh --no-build to check in the container, I see no trace of logs.

Also, how would the trained model be saved in this situation? Is the the train folder mounted as a volume so that the model would be persisted outside of the container?

Finally, the expression $(PWD) in the bash files throws error for me.

Partially rendered env in MineRLObtainDiamondDense-v0

10 months ago

Just happened again, seems to be related with large bodies of water.

Partially rendered env in MineRLObtainDiamondDense-v0

10 months ago

I’ve just witnessed my agent interacting in an environment which looked partially rendered, ie large pieces appeared as transparent:

This is in MineRLObtainDiamondDense-v0. I am using minerl==0.2.7.

mc_1.log output around these times:

[10:51:00] [Client thread/INFO]: [CHAT] §l804...
[10:51:00] [Client thread/INFO]: [CHAT] §l803...
[10:51:00] [Client thread/ERROR]: Null returned as 'hitResult', this shouldn't happen!
[10:51:00] [Client thread/INFO]: [CHAT] §l802...
[10:51:01] [Client thread/INFO]: [CHAT] §l801...
[10:51:01] [Client thread/INFO]: [CHAT] §l800...

I don’t see anything else suspicious in this log file. The following episodes seem to be running correctly.

Can't train in MineRLObtainIronPickaxeDense-v0 since 0.2.7

10 months ago

Great, thanks for the swift fix! :+1:

Can't train in MineRLObtainIronPickaxeDense-v0 since 0.2.7

10 months ago

I just updated to 0.2.7, when trying to train in MineRLObtainIronPickaxeDense-v0 I now get the following errors:

ERROR    - 2019-10-18 04:52:00,768 - [minerl.env.malmo.instance.2edcf5 log_to_file 535] [04:52:00] [EnvServerSocketHandler/INFO]: [STDOUT]: REPLYING WITH: MALMOERRORcvc-complex-type.3.2.2: Attribute 'avoidLoops' is not allowed to appear in element 'RewardForPossessingItem'.
ERROR    - 2019-10-18 04:52:01,867 - [minerl.env.malmo.instance.2edcf5 log_to_file 535] [04:52:01] [EnvServerSocketHandler/INFO]: [STDOUT]: REPLYING WITH: MALMOERRORcvc-complex-type.3.2.2: Attribute 'avoidLoops' is not allowed to appear in element 'RewardForPossessingItem'.
ERROR    - 2019-10-18 04:52:02,950 - [minerl.env.malmo.instance.2edcf5 log_to_file 535] [04:52:02] [EnvServerSocketHandler/INFO]: [STDOUT]: REPLYING WITH: MALMOERRORcvc-complex-type.3.2.2: Attribute 'avoidLoops' is not allowed to appear in element 'RewardForPossessingItem'.
...

This environment was working fine before, but I was using the package version from before the reward loop was fixed, so maybe this problem was already present since 0.2.5.

Unity Obstacle Tower Challenge

Tutorial Deep Reinforcement Learning to try with PyTorch

Over 1 year ago

Incremental PyTorch implementations of main algos:
RL-Adventure DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL
RL-Adventure-2 actor critic / proximal policy optimization / acer / ddpg / twin dueling ddpg / soft actor critic / generative adversarial imitation learning / HER

Good implementations of A2C/PPO/ACKTR: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr

BTW The repo for the Udacity course is open source: https://github.com/udacity/deep-reinforcement-learning

MasterScrat has not provided any information yet.