Activity
Ratings Progression
Challenge Categories
Challenges Entered
Play in a realistic insurance market, compete for profit!
Latest submissions
Sample-efficient reinforcement learning in Minecraft
Latest submissions
Multi Agent Reinforcement Learning on Trains.
Latest submissions
See Allgraded | 32752 | ||
graded | 32750 | ||
graded | 32749 |
Latest submissions
A new benchmark for Artificial Intelligence (AI) research in Reinforcement Learning
Latest submissions
Participant | Rating |
---|
Participant | Rating |
---|
Flatland Challenge
Competition has ended
Over 4 years agoThank you for the very nice competition!
I enjoyed participating in it and I have to say that by the end of the competition I still had multiple unexplored ideas for improving the solution quality (lack of time and motivation from the leaderboard prevented me from implementing them).
Furthermore we will prepare a publication containing some of the solutions provided through the challenge
Will this be a formal publication? If yes, will solution authors be recognized as co-authors?
Publishing the Solutions
Over 4 years agoHi @mlerik,
What does it mean
We are currently evaluation the top submissions of the challenge
Are you running them on another set of test cases? If yes, are you running all the submissions from the top participants or just the last one? Or by βevaluationβ do you mean some kind of visual inspection to make sure the submissions are not cheating somehow? I am just wondering what it means for the submissions to be βvalidβ (as you said) - they did run on 250 hidden test cases after all.
Evaluation time
Over 4 years agoSo what is the maximum time limit then? (for the βTotal Execution Timeβ displayed for each submission).
Is it 8 hours = 28800 seconds?
Or is it 10 hours = 36000 seconds ?
My latest submission shows a Total Execution Time of 28845 seconds (so only 45 seconds more than 8 hours). Should I try to optimize its runtime? (or just submit again and hope Iβm luckier next time?)
Or, in fact, can I use all the way up to 36000 seconds, in which case, I could really use this extra time
Also, it would be nice to have the maximum runtime enforced. Disqualifying some submissions later on is quite bad. Anyway, in my case this is the 1st submission which exceeded an 8 hour βTotal Execution Timeβ, so a decision here would be welcome.
[ANNOUNCEMENT] Submission wokring for Round 2
Almost 5 years agoAre there any updates about this? It would really help my approach if I knew the number of allowed time steps exactly. Unless this is not desired (estimating the number of cities can also be part of the challenge, Iβd just like to know if thatβs the case).
Malfunction data generator does not work
Almost 5 years agoThis isnβt the case anymore starting with version 2.1.10, is it? I am now seeing the malfunction duration being updated also for agents who did not enter the environment.
This was quite surprising, because you explicitly mentioned this behavior in at least 2 posts, only to see it changed with the updated version. Anyway, I currently updated my logic to consider the new behavior, so Iβm hoping there wonβt be any more going back and forth on this topic.
Please note that every time you change some core behavior in the simulator thereβs a really non-trivial amount of work required to update existing solutions only to behave correctly, when at this point I feel we should be focusing on improving the quality of the solutions, and not at reverse engineering whatβs new with the latest version of the simulator.
[ANNOUNCEMENT] Submission wokring for Round 2
Almost 5 years agoActually, I find the max_time_steps formula to be a bit incorrect. When I generate local tests with different number of agents and different number of cities (starting from the example from the repository), I sometimes see the simulation ending earlier than expected. After running more such tests, it seems obvious that the actual formula is:
max_time_steps = int(4 * 2 * (env.width + env.height + number_of_agents / number_of_cities))
So the last term is only 20 when the ratio of agents to cities is 20. I donβt seem to find how to get the number of cities, and I also canβt find a function which returns the number of time steps (without being passed the actual ratio agents/cities as an argument).
I would really like to know the maximum number of time steps when making decisions - can you please suggest a way to achieve this?
Computation budget
Almost 5 years agoThere seems to be another change regarding malfunctions. In the previous Flatland version, the 1st malfunction only started once the agent entered the environment (otherwise the malfunction duration was not updated). This seems to not be the case anymore (meaning the malfunction duration, as well as new malfunctions, are updated also when the agent is still outside the environment). This also makes a big difference in terms of behavior.
Computation budget
Almost 5 years agoRegarding cheating: Canβt malfunction_rate be used for the same purpose? It seems to be set to 0 for agents who never suffer any malfunction (so a plausible strategy, though not necessarily the one maximizing the fraction of done agents, would be to just enter these agents into the environment). Or will this parameter also go away? Or will it have a different meaning so that itβs non-zero also for agents who never suffer a malfunction?
Anyway, can I assume that by updating to the latest Flatland version (I am still using 2.1.8) I will see the latest changes? (i.e. at least I will stop getting the next_malfunction parameter).
Are the submissions made to Round 2 so far being reevaluated? I am guessing itβs possible that some of them relied on the presence of next_malfunction, so they should now stop working.
And maybe one last question about the malfunction duration. Can we assume that malfunctions are disjoint? (meaning that once the malfunction value is non-zero, the next malfunction can start only after the current malfunction ends)
Computation budget
Almost 5 years agoI just noticed that the FAQ says that the attribute βnext_malfunctionβ will be removed, βas it serves no purpose anymoreβ. Itβs sad to make such changes when some solutions may be based on having this attribute present. It actually provides some useful information, allowing the agent to know exactly when its next malfunction will occur.
I also see an upper limit of 250 for the number of agents. In a separate thread (a while ago), this limit was mentioned to be 200. Which upper limit is correct?
Also, whatβs the currently recommended way to generate local tests which resemble the ones used for scoring our submissions (in terms of parameter distributions) ? For Round 1 I was able to use the baselines repository, but parts of it havenβt been updated in a long time (and, in particular, Iβm not sure if anything from there generates any kind of tests with stochastic malfunction data).
[ANNOUNCEMENT] Start Round 2
Almost 5 years agoHi. I downloaded the (small) set of tests mentioned in the started kit and used them to test my solution using the setup from the starter kit (redis server + flatland evaluator + the sample run.py tn which I integrated my solution). But it seems that the agents are not leaving the environment once they reach their destinations (I see their reported status is DONE, instead of DONE_REMOVED). Do I need to set any extra parameters when creating the local/remote environments? Or are these arguments part of the test data, and itβs just that the test data was generated without the option to have agents leave the environment?
Whatβs the status for the official test cases? Are the agents leaving the environment (as mentioned in this thread) or not?
[ANNOUNCEMENT] Submission wokring for Round 2
Almost 5 years ago- And how large can env.width and env.height be ?
- Also another question, more as a clarification, to make sure I understood things correctly. Is it true that once an agent starts moving towards an adjacent cell, it wonβt be able to make any other decisions until it reaches that cell? Even if reaching it may take longer than 1/speed turns (e.g. because that cell is occupied by other trains, etc.). In my local tests Iβve seen in some cases the position_fraction can increase beyond 1.0 (even a value of 1.0 can only occur if the agent canβt enter the new cell as soon as its speed allows). So Iβm guessing that as long as position_fraction is strictly greater than zero, the agent canβt make any new decisions, is that correct?
[ANNOUNCEMENT] Start Round 2
Almost 5 years agoI simulated further until the agentβs malfunction ends and it seems that the agent βexitsβ from the malfunction with the position_fraction that I was expecting it to have before the malfunction started (in this case: 0.666666). To give some concrete data for the same agent as before:
- I read from env.agents the following data: position_fraction=0.333333 malfunction=1 next_malfunction=40
- I call env.step(β¦)
- I read from env.agents the following data: position_fraction=0.666666 malfunction=0 next_malfunction=40
So it seems that the move from position_fraction 0.333333 to 0.666666 is not βlostβ, but rather delayed. I guess itβs all caused by a different expectation of when malfunction is updated. From these examples, I guess malfunction is updated at the beginning of the env.step(β¦) call, while to me it seems more natural to have it updated at the end of env.step(β¦), so that:
- malfunction >= 1 means the agent is blocked for that many env.step(β¦) calls (now it doesnβt mean that)
- next_malfunction >= 1 means that there are that many env.step(β¦) calls left before the agent is blocked by the next malfunction (now it doesnβt mean that)
Is there any reason for the current behavior compared to the one Iβm expecting? Of course, now that I sort of reverse engineered the issue, I can work around it, but it still seems a bit unnatural to me.
[ANNOUNCEMENT] Start Round 2
Almost 5 years agoOK. Hereβs a concrete example I am encountering in a local test:
- An agent with speed=0.333333 started moving at a previous time step. I am reading its data from env.agents and it says: position_fraction=0.333333 malfunction=0 next_malfunction=1
- I call env.step(β¦). Obviously, this agent has no new action to do because itβs already involved in an ongoing move.
- I read again the data from env.agents for this agent. It shows: position_fraction=0.333333 malfunction=10 next_malfunction=40
My expectation was that at step 3 the position_fraction should be 0.666666. Or I am just interpreting incorrectly the next_malfunction value? My interpretation is that as long as malfunction=0 and next_malfunction=1 then that agent still has one more time step of βusefulβ moving before being blocked by the malfunction (so the next env.step(β¦) should still do something useful for that agent, or, in other words, that the malfunction begins at the end of the next env.step(β¦) call, i.e. after one more useful move). This seems to not be the case.
Everything seems to behave as expected in the other cases (malfunction >= 1, or malfunction=0 and (next_malfunction>=2 or next_malfunction=0)), meaning that the position_fractions are advanced correctly.
[ANNOUNCEMENT] Start Round 2
Almost 5 years agoI have one more question: Letβs assume there is an agent with speed less than 1 and that the agent is in the middle of performing a move (e.g. the agent has speed 0.25 and its position fraction is currently 0.5). And then a malfunction occurs for this agent at this time. What will happen to the agent once the malfunction ends?
- Will the agent continue the move it started before the malfunction occurred?
- Or will the agent be βresetβ (for lack of a better word) and will be able to start a new move as soon as the malfunction ends?
I was expecting case 1, but I encountered a case where I see the reported position_fraction being reset to 0 when a malfunction starts, and I donβt know if itβs just a reporting issue (i.e. the position_fraction is wrongly reported during malfunctions), or if itβs intended.
[ANNOUNCEMENT] Start Round 2
Almost 5 years agoI finally got a chance to look at the provided example and I have a few questions:
-
can we use env.agents in our code in order to get the current agentsβ positions, directions and targets? (like the example does) this seems much easier than somehow extracting them from observations (where they are encoded in some format)
-
do we indeed have access to so much malfunction information? (e.g. if an agent will ever malfunction or not, and when the next malfunction will occur?) this information is definitely useful and Iβd like to use it for making decisions, but I want to make sure we can indeed use it
-
if an agent is already malfunctioning, malfunction_data[βnext_malfunctionβ] seems to indicate how many steps after the end of the current malfunction the next malfunction will occur - this is not obvious from its name (I initially expected it to always be relative to the current time step, but thatβs not the case); is this intended?
-
if an agent is malfunctioning from the start and the agent doesnβt enter the environment (i.e. it remains in the READY_TO_DEPART state), the malfunction duration is not decreased - is this intended? given that the agent will be penalized for every time step when it remains outside the environment (before entering), it seems unexpected to not allow its malfunction duration to also βexpireβ while the agent is still outside the environment - so Iβm asking: is this intended?
And thanks for all the work put into preparing Round 2. It looks indeed much more interesting than Round 1.
Format of observation data in Flatland 2.0
Almost 5 years agoThank you for the pointers. They do help and they show me that the current encoding (for the global observation) seems wrong. For instance, the first channel of the (height, width, 4) map contains the initial direction of the current agent. But zero is both the default value and a valid value for the initial direction (which is a number from 0 to 3). So this encoding is not enough to identify the initial position of each agent.
Besides the logical issue with the encoding (which I donβt think Iβm wrong about), another issue I am seeing is that it seems this (height, width, 4) map is not always fully populated for each agent. What I mean is: in the observation of each agent x, I printed all the cells (i,j) which have a non-zero value at any of the 4 channels (in the (height, width, 4) map). There should always be N (N=number of agents) cells printed by this approach, but for some agents this number is less than N (donβt know why).
Format of observation data in Flatland 2.0
Almost 5 years agoIt seems that the format of the observation data changed from v1 to v2. Unfortunately, I canβt find documented anywhere what the new observation data is supposed to contain. I am interested in the global observation at first.
In v1 the global observation of each agent consisted of 4 arrays: transition map, encoding of the starting position, encoding of the ending position, encoding of the initial orientation.
Now I see there are only 3 arrays per agent. The first one seems to still be the transition map (I think). The 3rd one seems to still be the same encoding of the target position (I think). But itβs unclear what the encoding of the 2nd array is. It seems to also contain the speed of each agent, but I donβt know how to get their starting positions and initial orientation. The official documentation is really lacking: http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/intro_observation_actions.html
Can you please point me to some examples which decode these observations (in Flatland v2) or to some explanations/documentation?
[ANNOUNCEMENT] Updated Rules and Clarification
About 5 years agoAs you can see on the leaderboard already, avoiding conflicts and reaching destinations within the maximum allowed time steps is rather easy in Round 1 (meaning all the 1000 secret cases can be solved perfectly from this perspective). The only interesting part remaining in Round 1, in my opinion, is trying to maximize the mean reward. This is a non-trivial task and I personally have many ideas that i would have liked to try. However, given that Round 1 will not count towards the final standings, and given that I donβt know too many details about the rules and test sizes for Round 2, I am now reluctant to spend any more time to improve the mean reward for Round 1, since itβs possible that any techniques I will use/develop for this will be unusable in Round 2.
My personal preference is to start Round 2 as soon as possible, in order to start solving the interesting problems Is the time line for Round 2 still the one mentioned in the Overview section? (from mid-August to December 1st?)
[ANNOUNCEMENT] Submissions Open
About 5 years agoNever mind. I figured things out with a bit of trial and error.
Publishing the Solutions
Over 4 years agoAnd here is the winning solution: https://gitlab.aicrowd.com/mugurelionut/flatland-challenge-starter-kit (same repository used for making submissions during the contest)