And here is the winning solution: https://gitlab.aicrowd.com/mugurelionut/flatland-challenge-starter-kit (same repository used for making submissions during the contest)
Thank you for the very nice competition!
I enjoyed participating in it and I have to say that by the end of the competition I still had multiple unexplored ideas for improving the solution quality (lack of time and motivation from the leaderboard prevented me from implementing them).
Furthermore we will prepare a publication containing some of the solutions provided through the challenge
Will this be a formal publication? If yes, will solution authors be recognized as co-authors?
What does it mean
We are currently evaluation the top submissions of the challenge
Are you running them on another set of test cases? If yes, are you running all the submissions from the top participants or just the last one? Or by “evaluation” do you mean some kind of visual inspection to make sure the submissions are not cheating somehow? I am just wondering what it means for the submissions to be “valid” (as you said) - they did run on 250 hidden test cases after all.
So what is the maximum time limit then? (for the “Total Execution Time” displayed for each submission).
Is it 8 hours = 28800 seconds?
Or is it 10 hours = 36000 seconds ?
My latest submission shows a Total Execution Time of 28845 seconds (so only 45 seconds more than 8 hours). Should I try to optimize its runtime? (or just submit again and hope I’m luckier next time?)
Or, in fact, can I use all the way up to 36000 seconds, in which case, I could really use this extra time
Also, it would be nice to have the maximum runtime enforced. Disqualifying some submissions later on is quite bad. Anyway, in my case this is the 1st submission which exceeded an 8 hour “Total Execution Time”, so a decision here would be welcome.
Are there any updates about this? It would really help my approach if I knew the number of allowed time steps exactly. Unless this is not desired (estimating the number of cities can also be part of the challenge, I’d just like to know if that’s the case).
This isn’t the case anymore starting with version 2.1.10, is it? I am now seeing the malfunction duration being updated also for agents who did not enter the environment.
This was quite surprising, because you explicitly mentioned this behavior in at least 2 posts, only to see it changed with the updated version. Anyway, I currently updated my logic to consider the new behavior, so I’m hoping there won’t be any more going back and forth on this topic.
Please note that every time you change some core behavior in the simulator there’s a really non-trivial amount of work required to update existing solutions only to behave correctly, when at this point I feel we should be focusing on improving the quality of the solutions, and not at reverse engineering what’s new with the latest version of the simulator.
Actually, I find the max_time_steps formula to be a bit incorrect. When I generate local tests with different number of agents and different number of cities (starting from the example from the repository), I sometimes see the simulation ending earlier than expected. After running more such tests, it seems obvious that the actual formula is:
max_time_steps = int(4 * 2 * (env.width + env.height + number_of_agents / number_of_cities))
So the last term is only 20 when the ratio of agents to cities is 20. I don’t seem to find how to get the number of cities, and I also can’t find a function which returns the number of time steps (without being passed the actual ratio agents/cities as an argument).
I would really like to know the maximum number of time steps when making decisions - can you please suggest a way to achieve this?
There seems to be another change regarding malfunctions. In the previous Flatland version, the 1st malfunction only started once the agent entered the environment (otherwise the malfunction duration was not updated). This seems to not be the case anymore (meaning the malfunction duration, as well as new malfunctions, are updated also when the agent is still outside the environment). This also makes a big difference in terms of behavior.
Regarding cheating: Can’t malfunction_rate be used for the same purpose? It seems to be set to 0 for agents who never suffer any malfunction (so a plausible strategy, though not necessarily the one maximizing the fraction of done agents, would be to just enter these agents into the environment). Or will this parameter also go away? Or will it have a different meaning so that it’s non-zero also for agents who never suffer a malfunction?
Anyway, can I assume that by updating to the latest Flatland version (I am still using 2.1.8) I will see the latest changes? (i.e. at least I will stop getting the next_malfunction parameter).
Are the submissions made to Round 2 so far being reevaluated? I am guessing it’s possible that some of them relied on the presence of next_malfunction, so they should now stop working.
And maybe one last question about the malfunction duration. Can we assume that malfunctions are disjoint? (meaning that once the malfunction value is non-zero, the next malfunction can start only after the current malfunction ends)
I just noticed that the FAQ says that the attribute “next_malfunction” will be removed, “as it serves no purpose anymore”. It’s sad to make such changes when some solutions may be based on having this attribute present. It actually provides some useful information, allowing the agent to know exactly when its next malfunction will occur.
I also see an upper limit of 250 for the number of agents. In a separate thread (a while ago), this limit was mentioned to be 200. Which upper limit is correct?
Also, what’s the currently recommended way to generate local tests which resemble the ones used for scoring our submissions (in terms of parameter distributions) ? For Round 1 I was able to use the baselines repository, but parts of it haven’t been updated in a long time (and, in particular, I’m not sure if anything from there generates any kind of tests with stochastic malfunction data).
Hi. I downloaded the (small) set of tests mentioned in the started kit and used them to test my solution using the setup from the starter kit (redis server + flatland evaluator + the sample run.py tn which I integrated my solution). But it seems that the agents are not leaving the environment once they reach their destinations (I see their reported status is DONE, instead of DONE_REMOVED). Do I need to set any extra parameters when creating the local/remote environments? Or are these arguments part of the test data, and it’s just that the test data was generated without the option to have agents leave the environment?
What’s the status for the official test cases? Are the agents leaving the environment (as mentioned in this thread) or not?
- And how large can env.width and env.height be ?
- Also another question, more as a clarification, to make sure I understood things correctly. Is it true that once an agent starts moving towards an adjacent cell, it won’t be able to make any other decisions until it reaches that cell? Even if reaching it may take longer than 1/speed turns (e.g. because that cell is occupied by other trains, etc.). In my local tests I’ve seen in some cases the position_fraction can increase beyond 1.0 (even a value of 1.0 can only occur if the agent can’t enter the new cell as soon as its speed allows). So I’m guessing that as long as position_fraction is strictly greater than zero, the agent can’t make any new decisions, is that correct?
I simulated further until the agent’s malfunction ends and it seems that the agent “exits” from the malfunction with the position_fraction that I was expecting it to have before the malfunction started (in this case: 0.666666). To give some concrete data for the same agent as before:
- I read from env.agents the following data: position_fraction=0.333333 malfunction=1 next_malfunction=40
- I call env.step(…)
- I read from env.agents the following data: position_fraction=0.666666 malfunction=0 next_malfunction=40
So it seems that the move from position_fraction 0.333333 to 0.666666 is not “lost”, but rather delayed. I guess it’s all caused by a different expectation of when malfunction is updated. From these examples, I guess malfunction is updated at the beginning of the env.step(…) call, while to me it seems more natural to have it updated at the end of env.step(…), so that:
- malfunction >= 1 means the agent is blocked for that many env.step(…) calls (now it doesn’t mean that)
- next_malfunction >= 1 means that there are that many env.step(…) calls left before the agent is blocked by the next malfunction (now it doesn’t mean that)
Is there any reason for the current behavior compared to the one I’m expecting? Of course, now that I sort of reverse engineered the issue, I can work around it, but it still seems a bit unnatural to me.
OK. Here’s a concrete example I am encountering in a local test:
- An agent with speed=0.333333 started moving at a previous time step. I am reading its data from env.agents and it says: position_fraction=0.333333 malfunction=0 next_malfunction=1
- I call env.step(…). Obviously, this agent has no new action to do because it’s already involved in an ongoing move.
- I read again the data from env.agents for this agent. It shows: position_fraction=0.333333 malfunction=10 next_malfunction=40
My expectation was that at step 3 the position_fraction should be 0.666666. Or I am just interpreting incorrectly the next_malfunction value? My interpretation is that as long as malfunction=0 and next_malfunction=1 then that agent still has one more time step of “useful” moving before being blocked by the malfunction (so the next env.step(…) should still do something useful for that agent, or, in other words, that the malfunction begins at the end of the next env.step(…) call, i.e. after one more useful move). This seems to not be the case.
Everything seems to behave as expected in the other cases (malfunction >= 1, or malfunction=0 and (next_malfunction>=2 or next_malfunction=0)), meaning that the position_fractions are advanced correctly.
I have one more question: Let’s assume there is an agent with speed less than 1 and that the agent is in the middle of performing a move (e.g. the agent has speed 0.25 and its position fraction is currently 0.5). And then a malfunction occurs for this agent at this time. What will happen to the agent once the malfunction ends?
- Will the agent continue the move it started before the malfunction occurred?
- Or will the agent be “reset” (for lack of a better word) and will be able to start a new move as soon as the malfunction ends?
I was expecting case 1, but I encountered a case where I see the reported position_fraction being reset to 0 when a malfunction starts, and I don’t know if it’s just a reporting issue (i.e. the position_fraction is wrongly reported during malfunctions), or if it’s intended.
I finally got a chance to look at the provided example and I have a few questions:
can we use env.agents in our code in order to get the current agents’ positions, directions and targets? (like the example does) this seems much easier than somehow extracting them from observations (where they are encoded in some format)
do we indeed have access to so much malfunction information? (e.g. if an agent will ever malfunction or not, and when the next malfunction will occur?) this information is definitely useful and I’d like to use it for making decisions, but I want to make sure we can indeed use it
if an agent is already malfunctioning, malfunction_data[‘next_malfunction’] seems to indicate how many steps after the end of the current malfunction the next malfunction will occur - this is not obvious from its name (I initially expected it to always be relative to the current time step, but that’s not the case); is this intended?
if an agent is malfunctioning from the start and the agent doesn’t enter the environment (i.e. it remains in the READY_TO_DEPART state), the malfunction duration is not decreased - is this intended? given that the agent will be penalized for every time step when it remains outside the environment (before entering), it seems unexpected to not allow its malfunction duration to also “expire” while the agent is still outside the environment - so I’m asking: is this intended?
And thanks for all the work put into preparing Round 2. It looks indeed much more interesting than Round 1.
Thank you for the pointers. They do help and they show me that the current encoding (for the global observation) seems wrong. For instance, the first channel of the (height, width, 4) map contains the initial direction of the current agent. But zero is both the default value and a valid value for the initial direction (which is a number from 0 to 3). So this encoding is not enough to identify the initial position of each agent.
Besides the logical issue with the encoding (which I don’t think I’m wrong about), another issue I am seeing is that it seems this (height, width, 4) map is not always fully populated for each agent. What I mean is: in the observation of each agent x, I printed all the cells (i,j) which have a non-zero value at any of the 4 channels (in the (height, width, 4) map). There should always be N (N=number of agents) cells printed by this approach, but for some agents this number is less than N (don’t know why).
It seems that the format of the observation data changed from v1 to v2. Unfortunately, I can’t find documented anywhere what the new observation data is supposed to contain. I am interested in the global observation at first.
In v1 the global observation of each agent consisted of 4 arrays: transition map, encoding of the starting position, encoding of the ending position, encoding of the initial orientation.
Now I see there are only 3 arrays per agent. The first one seems to still be the transition map (I think). The 3rd one seems to still be the same encoding of the target position (I think). But it’s unclear what the encoding of the 2nd array is. It seems to also contain the speed of each agent, but I don’t know how to get their starting positions and initial orientation. The official documentation is really lacking: http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/intro_observation_actions.html
Can you please point me to some examples which decode these observations (in Flatland v2) or to some explanations/documentation?
As you can see on the leaderboard already, avoiding conflicts and reaching destinations within the maximum allowed time steps is rather easy in Round 1 (meaning all the 1000 secret cases can be solved perfectly from this perspective). The only interesting part remaining in Round 1, in my opinion, is trying to maximize the mean reward. This is a non-trivial task and I personally have many ideas that i would have liked to try. However, given that Round 1 will not count towards the final standings, and given that I don’t know too many details about the rules and test sizes for Round 2, I am now reluctant to spend any more time to improve the mean reward for Round 1, since it’s possible that any techniques I will use/develop for this will be unusable in Round 2.
My personal preference is to start Round 2 as soon as possible, in order to start solving the interesting problems Is the time line for Round 2 still the one mentioned in the Overview section? (from mid-August to December 1st?)
Never mind. I figured things out with a bit of trial and error.