Both submissions failed after 9hrs with the message
Task exceeded maximum timeout value (32400 seconds)
I tried submitting again, it has not gone into image_build mode - https://gitlab.aicrowd.com/joe_booth/obstacle-tower-challenge/issues/147
I looks like submissions are totally down
In the rules/competition overview it states that the Round 2 end date is: 11:59 pm, Monday, July 15th
However, on the competition ticker, it states Jul 15th, 7:59 UTC which would be 11:59 pm, Sunday, July 14th. - it should be Jul 16th, 7:59 am UTC
I submitted earlier today and they have been stuck in image build mode for 3hrs:
@Miffyli - which platform are you running on? I can not get it working in MacOS and from what I read, docker does not support --network=host on Mac
it’s when I try and run it local per these steps: https://github.com/Unity-Technologies/obstacle-tower-challenge#run-docker-image
it used to work, but not since the update of
aicrowd-repo2docker - @arthurj, does it work for you guys on Mac?
Thanks @mohanty! One problem I’m still having is running the test seeds locally (100-105) - when I invoke the environment docker it ends. I’m trying on MacOS
I’ve spent countless hours over the past 3 days trying to figure out why I could not upload/evaluate a new model nor reproduce the problem locally.
Basically, there is a breaking change whereby existing code will no longer run server-side - It would have saved me hours had there been a better error and/or someplace to look for notifications (I don’t think it helps to have 2 unity repros with issue trackers as well as the ai-crowd message board)
- June 13th, was my last successful upload of a model.
- On July 5th I tried to upload a new model - the only change to my code base the addition of the model and a reference to that model.
aicrowd-botposted the log: it just said this:
2019-07-06T07:13:55.1056876Z root 2019-07-06T07:13:55.129380334Z Traceback (most recent call last): 2019-07-06T07:13:55.12943787Z File "run_evaluation.py", line 7, in <module> 2019-07-06T07:13:55.129471922Z import gym 2019-07-06T07:13:55.129478407Z ModuleNotFoundError: No module named 'gym'
- I learned that we can now do debug submits: Announcement: Debug your submissions however, all it gave was the same log
- I tried to reproduce locally, however, the
build.shscript was giving me the error:
AttributeError: /srv/conda/bin/python: undefined symbol: archive_errno
- I thought maybe a conda or pip package update may have broken something so I manually tied each one to the valid version from June 13th
- I thought there may be some local issue with my docker, so I cleaned, deleted, reset
- I ran
pip install --upgrade aicrowd-repo2dockerand saw that it updated. This solved my local issue and was able to reproduce the server side error.
- Given that
aicrowd-repo2dockerhad been updated, i thought to look at the commit logs and found that this https://github.com/Unity-Technologies/obstacle-tower-challenge/commit/99c68faf2ed0f01ee8bc3e411bbdd4e85484a733 removed
source activate basefrom run.sh
Note: I still can not test locally - the agent code runs, but the environment docker immediately drops out. I also have to manually delete the docker image to force it to rebuilt (this was not the case prior to the
oh strange - it recorded an average floor of 6, then removed it when it restarted training
It looks like the video worked and it updated the leaderboard but then it started scoring again.
I tried again - this time i get ‘Unable to orchestrate submission, please contact Administrators.’
@mohanty - Have you been able to update v2.2? I resubmitted my agent but got errors:
@mohanty - would you post the logs on my submission here - https://gitlab.aicrowd.com/joe_booth/obstacle-tower-challenge/issues/121 - I’m not sure if it is the same problem. thanks
could it be related to this issue: https://github.com/Unity-Technologies/obstacle-tower-env/issues/91#issuecomment-494218846
they pushed a fix for this
I tried re-submitting my agent from round #1 to see how it did on round #2 - but get the following error
did something change? (all i did was push a text change and re-tag my repro)
The following containers terminated prematurely. : agent Please contact administrators, or refer to the execution logs.
Hi @mohanty - thanks for pointing this out - sorry I didn’t see this until this evening and so had continued to push but will back off per you points.
I think it’s great that you extended the timeline - it gives us the time to take more risks with experimental approaches.
OK - thanks! as a rule, should I try and limit the number of concurrent submissions?
I know statistically that my current agent can score 10 on each seed so if I submit it enough times then at some point it should score that.
@mohanty - I think I may have overloaded the system last night - I was trying to trigger multiple submissions so that I have them all queue up. Two show as Failed, Eight show as ImageBuildStarted after 15 hrs.
I’m not sure if there is anything I can do on my end
I also tried
aicrowd-repo2docker . and received the same error
On windows, when running
aicrowd-repo2docker .\build.sh I get the following error:
I believe the problem is because pwd is linux only.
File "c:\users\ml-2 windows\appdata\local\conda\conda\envs\obs\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\ml-2 windows\appdata\local\conda\conda\envs\obs\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\ml-2 windows\AppData\Local\conda\conda\envs\obs\Scripts\aicrowd-repo2docker.exe\__main__.py", line 5, in <module> File "c:\users\ml-2 windows\appdata\local\conda\conda\envs\obs\lib\site-packages\repo2docker\__main__.py", line 1, in <module> from .app import Repo2Docker File "c:\users\ml-2 windows\appdata\local\conda\conda\envs\obs\lib\site-packages\repo2docker\app.py", line 15, in <module> import pwd ModuleNotFoundError: No module named 'pwd'