The MineRL package is open-source and we certainly welcome anyone to experiment with it!
Thanks for your question! This year we decided that you CAN use reward when learning the action distribution from human demonstrations. E.g. it is permitted to learn the joint distribution between reward and human actions and condition on this distribution when sampling.
What you are describing, however, sounds like a hard-coded meta-controller, as the policy is dictated by hand-encoding the reward thresholds.
One option to mitigate this would simply be to learn a meta-controller that only observes reward, and decides against a fixed number of policies. You could then weight demonstrations by their reward to have a uniform sampling distribution.
That’s an error with the documentation - I thought we had fixed that but we must have missed a section, sorry!
Should be working now!
Miffyli is correct here - even pre-training using a small number of learned weights is not allowed.
We will be investigating code as well to validate submissions, the large file restriction simply provides an easy way to enforce the pre-training rule generally
Teams should be notified! Congratulations to the top teams!
Fixed! Install minerl 0.2.8
Great catch - until we can update the PyPI repo, using the
MineRLObtainDiamondDense-v0 environment should be a close replacement especially if you limit the number of steps!
Sorry I will take a look now - I thought this was covered by our unit tests!
To clarify - we now have moved to Round 1.5! The scores of Round 1 will be for archival purposes only.
Announcement - Round 1 Scores
We have reviewed multiple submissions that obtain rewards that should not be achievable in the
As this is due to an easily exploitable reward loop present in outdated minerl versions (prior to
minerl 0.2.5,) we have decided to add 5 additional submissions to each team. The new maximum number of submissions is now 25.
Please verify submissions locally to ensure your current scoreboard results. Top submissions submitted using out-dated
minerl versions (prior to
minerl 0.2.5) will be re-run to verify their performance.
Additionally, participants should retrain their models to account for the reward loop removal.
On AIcrowd, the minerl version can be checked by looking for the
minerl==<version> line in the
Locally, the python package can be updated with
python -m pip install --upgrade minerl command.
As long as the internal reward is learned from the data, this is allowed. This is not allowed if it is directly a function of the state and external data.
Unfortunately we are unable to release additional data at this time.
We will make an announcement if more data will be available for round 2.
Just to follow up here - this was indeed an issue and the fix is being bundled in minerl 0.2.5!
This was an issue with the obtainDiamond.xml - we have resolved it in the most recent release being deployed today or tomorrow!
Unfortunately, ImageNet pre-training is not allowed this year!
Re-training models is a key part of round 2 and if pre-trained weights are used there is no way to tell how those pre-trained weights were generated. Additionally, if pre-training happened during evaluation, it would be possible for competitors to upload large amounts of data which could be used to load other pre-trained weights.
In future iterations of the competition, if pre-training on ImageNet is a common ask, we could consider including certain datasets in the provided docker container; however, note that the texture pack of Minecraft will change in round 2 so techniques that work well transferring from the natural images to Minecraft may not work well in round 2!
They should have the same item-ID so this should not be an issue but I will verify this when checking it out!
The behavior is defined as occurs in vanilla Minecraft (where possible). For movement, nothing happens when asking for conflicting actions. For place and attack, both actions will be processed as the place handler is through Malmo and the attack action is handled by default Mincraft.
Thanks for this, I will take a look. Could be a weird interaction between the Minecraft give commands and the Malmo agent, I will explore building a world with the needed resources and see if this is still the case