AIcrowd | NeurIPS 2021 - The NetHack Challenge

NETHACK COMPETITION OFFICIAL RULES

UPDATE: An additional Competition track has been added as set forth in Section 7 below.

PLEASE READ THESE OFFICIAL RULES CAREFULLY. ENTRY INTO THIS COMPETITION CONSTITUTES YOUR ACCEPTANCE OF THESE OFFICIAL RULES. IF YOU DO NOT AGREE TO ANY PART OF THESE OFFICIAL RULES, PLEASE DO NOT ENTER THIS CHALLENGE.

NO PURCHASE IS NECESSARY TO ENTER OR WIN. A PURCHASE OF ANY KIND WILL NOT INCREASE YOUR CHANCES OF WINNING VOID WHERE PROHIBITED.

1. Competition Description

The NetHack Challenge is a competition involving the development of sequential decision-making agents to play the game NetHack, namely by

completing the game or, failing that,
accumulating as much in-game score per round as possible.

Concretely, each team of contestants will design and submit for evaluation one agent (of which there may be several iterations thereof over the course of the competition) which they will train or otherwise design using their own resources.

2. Sponsors

The Competition is organized by AICrowd SA, EPFL Innovation Park, Bâtiment C, c/o Fondation EPFL Innovation Park, 1015 Lausanne, Switzerland, referred to as “Organizers'' collectively from here on. Third parties may provide sponsorship to cover running costs, prizes and compute grants. These third parties will be known as “Sponsors”.

3. Organizers Admins

“Organizers Admins” are any companies or organizations authorized by Organizers to aid them with the administration, sponsorship or execution of this Challenge including but not limited to AICrowd SA and Facebook, Inc.

4. Entry

The Entry in this competition refers to a git repository on gitlab.aicrowd.com which includes:

Source code for running the submitted agent on the NetHack Learning Environment
Any data locally required for running this source code, e.g., model weights
Specifications of the software runtime
System description as specified in Section 12-b of the rules
Information provided during competition registration

In order to submit an entry to the competition, a representative of a participating team must create an account on AI Crowd, and register for the competition on the AIcrowd NetHack Challenge page. Registration for the challenge will require the representative to declare, on behalf of their team, where they are a team primarily consisting of non-industry researchers.

5. Competition Phases

The Challenge will be organized across two phases:

5-a. Development Phase

During the development phase:

Participants will be able to submit agents to the evaluation service with a limit of 3 successful submissions every 24 hours. The 24-hour submission limit windows will reset every day at 00:00 UTC.
Submitted agents will be evaluated on a number of differently seeded episodes determined by the organizers during the evaluation period, with the race, gender and class of the player character being randomly chosen by the system at the beginning of the episode.
Submissions made during the Development Phase will appear on the Competition Leaderboard and will count towards Test Phase qualification, but not count towards final competition ranking.
Some or all trajectories obtained from the agent acting upon the environment will be publicly visible as videos linked to the related leaderboard entry on AICrowd's platform.
Participants are welcome to run the evaluation protocol locally, and our submission system will allow them to do this, but the result of this evaluation will not appear on the leaderboard.

5-b. Test Phase

During the Test Phase:

The top 15 participants for each track in the development phase will be eligible to participate in the Test Phase. The organizers reserve the right to extend the number of participants, in exceptional circumstances.
Participants may submit up to three times during this entire phase, and the best results will be used for the final ranking. This is intended to give contestants a chance to deal with bugs or submission errors gracefully.
During the test phase, scores will be returned to the contestants upon completion of a submission, but rankings and the leaderboard will be kept secret until the end of the competition.
The test protocol is the same as the development protocol, save that the agents will be run on 4096 episodes to decrease variance and increase confidence in the final ranking.

6. Competition Start and End Dates

Development Phase: June 9st 2021, 12:00:00 pm GMT – October 15st 2021, 12:00:00 pm GMT
Test Phase: October 15th, 12:00:00 pm GMT – October 31st, 12:00:00 pm GMT

7. Competition Tracks

Rankings will be produced according to ``passive'' tracks, meaning that submissions are not made to a specific track (but rather to the competition as a whole) and will be ranked in each and every track that they qualify for, based on the system description (see section 12-b). Note that all eligible submissions qualify for track 1, and either track 2 or 3. If there is any ambiguity as to whether a particular submission qualifies for a track, including whether a submission qualifies for track 2 or track 3, the decision will be made at the discretion of the organizers. The tracks for this competition will be:

Best overall agent, awarded to the best performing agent in the competition. All submitted agents qualify for this track.
Best agent using a neural network, awarded to the best performing agent substantially using a neural network, deep-learning or significantly similar modelling technique, including, but not limited to, deep reinforcement learning, and hybrid (neuro-symbolic) methods.
Best agent not using a neural network, awarded to the best performing agent not using a neural network or significantly similar modelling technique. This includes, most prominently, agents which are not underpinned by parametric models.
Best agent from an academic/independent team, awarded to the best performing agent produced by a team predominantly led by non-industry affiliated researchers. Collaborating with industry researchers does not disqualify a team from this track, but the assumption (made in good faith) is that the bulk of the work will have been done by the academic/independent team members, without using hardware provided by industrial partners.

8. Agent Design

There is no restriction on how an agent is implemented, trained, or run except during evaluation, where:

The agent must receive observations using one or more of the observation modalities supported by NLE.
The agent must act on the environment by proposing, at each turn, an action according to the action space supported by NLE.
The agent must not attempt to directly interact with the NetHack binary, except through the OpenAI Gym API provided by NLE.
During evaluation, the agent must not attempt to connect resources or other parties outside of the test environment (e.g. by connecting to a third-party controller). If the agent requires access to ``external resources'' (e.g. the NetHack wiki), a local copy should be incorporated in the agent and relied upon during evaluation.

8-a. Time and Turn limits

During the development phase, the agent must complete 512-episode rollouts within 2hrs, with no single action lasting more than 300 seconds, and no single episode lasting more than 30 minutes. Episodes that do not reach a score of 1000 within 50,000 steps will be terminated. These restrictions are intended to be generous bounds to prevent abuse of the evaluation system resources, rather than to enforce particular efficiency constraints on agents.
During the test phase, the agent must complete 4096 episode rollouts within 24hrs, with no single action lasting more than 300 seconds, and no single episode lasting more than 30 minutes. Episodes that do not reach a score of 1000 within 50,000 steps will be terminated.
The in-game score at the point of failure will be used as part of computing the submission score, as if the agent had died in that episode.
Contestants can run the evaluation protocol for their agent locally with or without these constraints, to benchmark their agent's efficiency privately.

9. Local Evaluations

Code will be shared to allow applicants to evaluate their agents against the testing protocol either locally or remotely, as well as perform integration tests to determine whether their code will run on the evaluation server.

10. Competition Environment

The environment for all evaluations will be based on the ‘NetHackChallenge-v0’ gym environment, provided in the NLE repository, instantiated with all default parameters and settings. Should the need arise, the organizers reserve the right to make bug fixes and maintenance changes to the environment to ensure the smooth running of the competition. In such events, updates will be publicized, and the results currently available on the dev leaderboard will stand.

11. Am I Eligible to Enter the Challenge?

You are eligible to enter this Challenge if you (and each member of your Team) meet all of the following requirements as of the time and date of entry:

You are an individual;
You are 18 years of age or older but in no event less than the age of majority in your place of residence;
You have Internet Access, an Email Account, and access to a personal computer;

The organizers admins will not be able to transfer prize money to accounts of any of the following countries or regions. (Please note that residents of these countries or regions are still allowed to participate in the challenge and be ranked in the official rankings.)

The Crimea region of Ukraine
Cuba
Iran
North Korea
Sudan
Syria
Quebec, Canada
Brazil
Italy

Furthermore, teams involving one or more participants from AIcrowd SA and Facebook Inc., may submit entries for the purpose of benchmarking and comparison, but such entries are not considered part of the competition for the purpose of the official rankings, and not eligible for Prizes.

Teams from institutions sponsoring the competition, excluding AIcrowd SA and Facebook Inc., are eligible to participate (subject to the individual conditions listed above) and appear in official rankings, but are not eligible for Prizes.

Please Note: it is entirely your responsibility to review and understand your employer’s and countries policies about your eligibility to participate in this Challenge. If you participate in violation of your employer’s or countries policies, you and your Entry may be disqualified from the Challenge. Organizers disclaim any and all liability or responsibility with respect to disputes arising between an employer and such employer’s employee or between a country and its resident in relation to this matter.

12. Is the Entry an Eligible Entry?

To be eligible to be considered for a prize, as solely determined by the Organizers:
The Entry MUST:

be compatible with the official submission format;
Be self-contained and function without a dependency on any external services and network access
be in English;
be the Team’s own original work;
not have been submitted previously in any promotion of any kind;
not contain material or content that: is inappropriate, indecent, obscene, offensive, sexually explicit, pornographic, hateful, tortious, defamatory, or slanderous or libelous; or promotes bigotry, racism, hatred or harm against any group or individual or promotes discrimination based on race, gender, ethnicity, religion, nationality, disability, sexual orientation, or age; or promotes alcohol, illegal drugs, or tobacco; or violates or infringes another’s rights, including but not limited to rights of privacy, publicity, or their intellectual property rights; or is inconsistent with the message, brand, or image of Organizers, is unlawful; or is in violation of or contrary to the laws or regulations of any jurisdiction in which the Entry is created; and

The Team members MUST:

designate one person as the team leader who will be solely responsible for receiving communications from and communicating with Sponsor;
ensure the Team has obtained any and all consents, approvals, or licenses required for submission of the Entry;
obtain any consents necessary from all members of the Team with respect to the sharing of such member’s personal information as outlined herein;
obtain the agreement of all members of the Team to these Rules;
not generate the Entry by any means which violate these Rules, the Organizers Terms of Service or the Organizers Privacy Policy;
not engage in false, fraudulent, or deceptive acts at any phase during participation in the Challenge; and
not tamper or abuse any aspect of this Challenge.

The Team members MUST:

ensure the Team has obtained any and all consents, approvals, or licenses required for submission of the Entry;
obtain any consents necessary from all members of the Team with respect to the sharing of such member’s personal information as outlined herein;
obtain the agreement of all members of the Team to these Rules;
not generate the Entry by any means which violate these Rules, the Sponsor’s Terms of Service or the Sponsor’s Privacy Policy;
not engage in false, fraudulent, or deceptive acts at any phase during participation in the Challenge; and
not tamper or abuse any aspect of this Challenge.

12-a. Source Code Release

Participants are not required to release their source code to be ranked on the final leaderboard(s), but to be eligible for the Prizes, Participants are required to release the source code (including but not limited to training and inference code) of their solutions under an Open Source Foundation(OSF) approved license.

If a participating team does not receive the Prizes for the above-mentioned reason, the prizes will be offered to the next eligible team on the final leaderboard.

It is a requirement of entering into the competition that the source code for submissions during the test phase be privately shared with the competition organizers, to be used solely for the purpose of adjudication and checking for improper interactions between the agent and the environment.

Organizers reserve the right to disqualify a team if any improper interactions between the agent and the environment are found during the code inspection.

12-b. Systems description requirement

Participants submitting agents during the test phase will be required to submit a description of the system (training process, design, structure, etc.) using a free-form text entry field, but with guiding questions provided by the organizers. The amount of detail offered is up to the contestants, but the Organizers strongly encourage participants to be as precise, thorough as possible here. The system descriptions will be released at the end of the competition and will be used to categorize the system into tracks for the purpose of track-specific rankings. Some systems descriptions may be used to support the writing of a competition report, which track winners and select runners-up may be invited to co-author, at the discretion of the organizing committee. The organizers will seek the consent of the Participants before making any such system description publicly available for use by the research community.

13. Disqualification

If you, any Team member, or the Entry is found to be ineligible for any reason, including but not limited to conflicts within Teams and noncompliance with these Rules, Organizers and Organizers Affiliates reserve the right to disqualify the Entry and/or you and/or your Team members from this Challenge and any other contest or promotional activity sponsored or administered in any way by the Organizers.
A participant is not allowed to create more than one account to participate in the challenge. Violating this will result in disqualification from the challenge.
Participants should not attempt to get around the limited number of submissions during the test phase by entering several teams into the competition. Participants should only be associated with one Entry. If two teams have overlap in number of participants, or if the organizers deem two entries to be effectively similar modulo small changes, they reserve the right to disqualify both teams.
If a participating team does not receive the Prizes for any of the above-mentioned reasons, the prizes will be offered to the next eligible team on the final leaderboard.

14. How may the Entry potentially be used?

The Entry may be used in a few different ways. Organizers do not claim to own your Team’s Entry, however, by submitting the Entry you and each member of your Team:

hereby grants to Organizers a non-exclusive, irrevocable, royalty-free, world-wide right and license to review and analyze the Entry in relation to this Challenge;
hereby grants to Organizers a non-exclusive, irrevocable, royalty-free, world-wide right and license to trajectory data generated from evaluation of the Entry, to be used in a report submitted to NeurIPS after the competition has ended, and possibly released as an open-source dataset;
agrees that each member will execute any necessary paperwork for Organizers and Organizers Admins to use the rights and licenses granted hereunder;
acknowledges and agrees that the Team will not be compensated and may not be credited (at Organizer’s sole discretion) for the use of the Entry as described in these Rules;
acknowledges that the Organizers may have developed or commissioned materials similar to the Entry and waive any claims resulting from any similarities to the Entry;
understand and acknowledge that, subject to provision of Prizes, Organizers are not obligated to use the Entry in any way, even if the Entry is selected as a winning Entry.

Personal data you submit in relation to this Challenge will be used by Organizers in accordance with Section 20 of these Rules.

15. How will Winners be Selected and Notified?

During evaluation, and for the purpose of ranking submissions, the following ranking mechanism will be used by the Organizers:

For a given set of evaluation episodes, the average number of episodes where the agent completes the game will be computed, along with the median in-game end-of-episode score, and the mean in-game score.
This score defined as the cumulative reward of the NetHackChallenge-v0 environment, calculated up the point of the termination of the episode. It should be noted that in some cases this may deviate slightly from the NetHack’s ‘Final Game Score’, as the game sometimes applies minor modifications to score after death based on the end game state. These are detailed here.
Entries will be ranked by average number of wins, hereby referred to as - "primary evaluation metric score".
If two or more Entries have the same primary evaluation metric scores, they will be ranked based on the median in-game end of episode scores, hereby referred to as - "secondary evaluation metric score".
If the primary evaluation metric score and the secondary evaluation metric score are identical for two or more Teams, they will be ranked based on the mean in-game end of episode scores, hereby referred to as - "tertiary evaluation metric score".
If two or more Entries are tied with all evaluation metrics, then the prizes will be shared evenly among the said Teams.
Participants are encouraged, but in no way required, to incorporate this scoring mechanism into the training process of their agent (where relevant).

Potential winners will be contacted within two weeks of the advertised end of the test phase (Section 6) via the email associated with the AIcrowd.com account through which the Entry was submitted and must submit their systems description at that time in the form and within the timeframe specified by Organizers. If a potential winner (including each member of the potentially winning team) cannot be contacted, does not respond as directed, refuses the prize, or is found to be ineligible for any reason, such prize may be forfeited and awarded to an alternate winner. Only one alternate winner will be selected per each prize package, after which prizes will remain unawarded.

To the extent that there is any dispute as to the identity of the potential winner, the registered account holder of the email address associated with the AIcrowd account through which the Entry was first submitted will be deemed the official potential winner by Organizers. A registered account holder is defined as the natural person who is assigned to an email address by an Internet access provider, online service provider, or other organization (e.g., business, educational institution, etc.) that is responsible for assigning email addresses for the domain associated with the submitted email address.

16. Your Odds of Winning

ODDS OF WINNING A PRIZE ARE SUBJECT TO THE TOTAL NUMBER OF ELIGIBLE ENTRIES RECEIVED AND HOW YOUR ENTRY SCORES IN ACCORDANCE WITH THE JUDGING CRITERIA.

17. Prizes

Prizes will be announced separately on the AIcrowd NetHack Challenge competition page and advertised via social media. Prizes will be fulfilled in a manner determined by the Organizers and may require winners to have a bank account to receive prize funds.

18. When will prizes be awarded?

The prizes will be awarded within a commercially reasonable time frame to the designated Team Leader unless otherwise agreed to by Team Leader, remaining Team members and Organizer. All members of a Team may be required to complete and sign additional documentation, such as non-disclosures, representations and warranties, liability and publicity releases (unless prohibited by applicable law), and tax documents, or other similar documentation in the manner and within the timeframe specified by Organizer in order for the potentially winning team to claim the prize. Organizers will in no way be involved in any dispute with respect to receipt of a prize by any other members of a Team, including, without limitation, division of the prize value among Team members. Winners are responsible for any tax liability that may result from receipt of any prize.

Only prizes claimed in accordance with these Rules will be awarded.

19. Winner List

A list of all winners of this Challenge will be posted on AIcrowd Site and may be announced at Organizers’ discretion via Organizers’ Twitter, Facebook, Blog, or Website, or at an Organizer or Organizer Admins sponsored or hosted event.

20. Your Personal Data and Privacy

Organizers may use cookies and/or collect IP addresses for the purpose of implementing or exercising its rights or obligations under the Rules, for information purposes, identifying your location, including without limitation for the purpose of redirecting you to the appropriate geographic website, if applicable, or for any other lawful purpose in accordance with the AIcrowd Privacy Policy.

Organizers may use the personal data you provide via your participation in this Challenge:

to contact you in relation to the Challenge;
to confirm the details of your Entry;
to administer and execute this Challenge, including sharing it with Organizer Admins;
at Organizers’ discretion, to credit you and/or your Team for the Entry, identify you and/or your Team as a Winner, or other similar notice; and
as otherwise noted in these Rules or as necessary for Organizers to meet their obligations under these Rules or applicable law.

Organizers only require name and email address to be submitted for you to participate in this Challenge for its uses as outlined in this Section 19.
Please read the AIcrowd Terms and Conditions, Participation Terms carefully to understand how your data may be used by AIcrowd SA.

21. Additional Terms and Conditions

If Organizers determine, in their sole discretion, that any portion of this Challenge is compromised by virus, bugs, unauthorized human intervention, or any other causes beyond its control, that in the sole opinion of Organizers corrupts, or impairs the administration, security, fairness or proper participation in/of the Challenge, Organizers reserve the right to (a) cancel the Challenge; (b) pause the Challenge until such time the aforementioned issues may be resolved; or (c) consider only those Entries submitted prior to the when the Challenge was so compromised for the prizes.

To the fullest extent permitted by applicable law, you agree that Organizers, Organizer Affiliates, and Organizer Admins, and each of their directors, officers, employees, agents and assigns, will not be liable for personal injuries, death, damages, expenses or costs or losses of any kind resulting from participation or inability to participate in this Challenge or acceptance of or use or inability to use a prize or parts thereof including, without limitation, claims, suits, injuries, losses and damages related to personal injuries, death, damage to or destruction of property, rights of publicity or privacy, defamation or portrayal in a false light (whether intentional or unintentional), whether under a theory of contract, tort (including negligence), warranty or other theory.

Your use of any other products and services required by these Rules, whether required by these Rules or not, are subject to the terms and conditions associated with such products or services, including the AIcrowd site and services.

In the event any clause or provision of these Rules prove unenforceable, void or incomplete, the validity of the other conditions will remain unaffected.

NeurIPS 2021 - The NetHack Challenge

Challenge Rules