Has filled their profile page
How can we use the latest Flatland environemnt, from master branch’s version, or pip release 2.x.x?
(Round 1 was using flatland-rl==2.2.1)
Yes, I feel current agent number is large enough…
It seems that generating large env is very slow, it may be a problem for large env’s offline RL training…
In the case that some teams can solve all environments in 8 hours, is there a deadline for environment change in the Round 2?
I think it may be helpful to keep the env unchanged for the last 3+ weeks, so that we can have time to finetune our algoriths, instead of searching for different directions…
Hey, is current Round 1 using master branch’s version, or pip release 2.2.1?
(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)
What happen if have a OR submission then a RL submission?
Which result will be showed on leaderboard? or both?
I may be wrong, but below is my feedback about adding many more evaluation episodes:
Currently RL’s complete rate is row even given current env settings. It may narrow the application of RL in order to compete with OR method.
It may ask us to focus more on OR method.
As I commented before, I think larger env is good, but it’s better to have much less test cases.
Thanks for the thread for dicussion.
As a participant who really interested in usng RL to solve this problem, my concerns are:
- Timing. When we use RL, likely we need to use GPU for inference. Unfortuntately, our GPU utilization should be low as it only serve one or a few states per batch. So I may expect that for larger grid size, RL with GPU is likely to be less efficient than OR method.
- Diversity of env. When we have 14 different size of grid, it makes our RL training harder. If we further consider different speeds, it may require more effort for deadlock free planning.
My wishes for Round 2 are:
- Use only a few large test cases(for example, # of test cases <= 10), while keep same overall running time. It may be even better to test with same grid size.
- Use same speed for different agents. I personally prefer to focus more on RL related things, instead of dealing with dead-lock from different speeds.
I think one of OR’s shortage is that it’s not straightforward to optimize for global reward.
My understanding: RL’s advantage is finding a better solution(combining with OR), but not acting in a shorter time.
If we want to see RL performan better than OR, we should give RL enough time for planning/inference on large grid env. (both 5 min and 5s may not be enough for RL to do planning and inference. )
How many test maps to generate the submission result?
After each submission, there will be a video for this submission. Is the video including all test cases?
As you mentioned, small map size may be better with operations search.
I am not sure if there will be test cases with small map size?
If yes, then we may need to implement an operations search algorithm, along with RL algorithm.
My question is: will you limit the minimal map size? For example, larger than K x K, ensuring that most operations search algorithm can not solve the problem in time limit. So that we can focus more on real large map size.
Thanks @MasterScrat for the quick reply.
I feel much clear with your reply.
Thanks @MasterScrat for the kind reply.
May I know how much difference it may be between round 1 and round 2?
Consider the example with two different settings:
- when we just need our algorithm to work with map size 150 * 150
- when we also need our algorithm to work with map size 1500 * 1500
It may be quite different to design a optimal state/algorithm when the problem settings are different.
I am using WSL2 with Ubuntu(16.04) and docker.
It works well so far.
For the visualization, I have tried two ways, both work for me:
- Install GUI and XServer for WSL2.
Some links I found helpful:
- After getting frames in png format, use the following function to generate a video:
Overall, I feel the 2nd method is simpler and I am currently using it for visualization.
For RL to work well, it’s better to have similar configs between the simulation environment of training and evaluation.
To help properly setting up the training environment, can you provide some basic information in the evaluation environment?
For example, the range of the following settings:
- width and height of map
- num of trains
- num of cities
- type of city distribution
- speed ratio of trains
- max rails between cities
- max rails in cities
- type of schedule generator
- malfunction: rate, min/max duration.
In windows, I can only install the environment via: pip install flatland-rl
However, it failed to run the evaluator, error same as in MemoAI’s post. (… EOFError: Ran out of input)