Loading

Orak🎮 Competition 2025 - Official Rules

NO PURCHASE OR PAYMENT NECESSARY TO ENTER OR WIN. VOID WHERE PROHIBITED OR WHERE REGISTRATION, BONDING OR LOCALIZATION ARE REQUIRED.

IMPORTANT — Participants of the Competition (hereinafter referred to as “you”, “Participants”, “teams” or “Teams”) are permitted to hold only a single registered account. The use of multiple accounts or identities is forbidden and will result in instant disqualification.

Entry & Acceptance

By entering this Competition, you acknowledge and accept these Official Rules (the “Rules”). The Orak Competition 2025 is a skills-based challenge focused on advancing research in LLM agents and gaming AI. Submission of any entry (together with all materials, content, and information provided as part of the submission hereinafter referred to as “Submission”) constitutes your agreement to comply fully with these Rules. If you do not agree, you must not submit.


1. Competition‑Specific Terms

  • Competition Organizer: Krafton, Inc.

  • Competition Sponsors: Nvidia, AWS, OpenAI

  • Competition Website: https://krafton-ai.github.io/orak-leaderboard/

  • Total Prize Pool: USD $20,000 (Cash Prize Pool, $10K per track) • 1st Place: $6,000 • 2nd Place: $3,000 • 3rd Place: $1,000

  • Sponsored Credit Prize Pool:

  • Track 1: NVIDIA Brev Credit: $15,000

  • Track 2: AWS Bedrock Credit: $20,000

  • OpenAI API Credit: $10,000

  • Winner License Type: Non-Exclusive (see § 3.3 Winner License)

  • Data Access & Use: Competition Use + Non-Commercial and Academic Research


2. Competition‑Specific Rules

2.1 Competition Overview

The Competition evaluates Participant-created LLM agents on five free video games using the Orak benchmark:

  • Street Fighter III (SF3) — Fighting game requiring precise timing and strategy

  • Super Mario (SM) — Action game requiring spatial reasoning and timing

  • Pokémon (Pkmon) — Turn-based RPG with strategic decision making

  • StarCraft II (SC2) — Real-time strategy game with resource management

  • 2048 — Puzzle game requiring mathematical reasoning

There are two tracks in the competition:

  • Track 1 (Lightweight): Maximum 8 billion total parameters (including all active and frozen parameters, embeddings, adapters, and LoRA modules)

  • Track 2 (Open): No parameter limit

  • Each track follows identical evaluation and cash prize structures, with separate leaderboards and sponsored credit prize pools.

2.2 Evaluation Metrics

The Competition uses the same evaluation metrics as the Orak benchmark for each game.

Final ranking is determined by the weighted average of the five game scores using the weights in the table; teams are ranked by final score.

  1. Each game contributes to the final score based on difficulty: SF3 (10%), Super Mario (15%), Pokémon (30%), StarCraft II (30%), 2048 (15%)

  2. Teams are ranked within each game based on Orak's standard metrics

  3. The weighted average rank across all games determines the final ranking

  4. Tie-breaking criteria
    If two or more teams are tied on the primary evaluation metric, rank them by the following criteria in order (earlier items take precedence):

    • Lower model complexity — measured as Aggregate Total Parameters (ATP). ATP is the sum of total parameters across all distinct models used during the final official evaluation. “Total parameters” include all active and frozen weights, embeddings, adapters, and LoRA modules. For Mixture-of-Experts, count all experts (total parameters), not only the activated experts.

    • Lower mean LLM inference calls per evaluation episode (fewer is better). Measured as: total inference calls made during the official evaluation ÷ number of evaluation episodes.

    • Shorter mean prompt length (tokens) per evaluation episode (fewer is better). Measured as: total tokens sent to models during the official evaluation ÷ number of evaluation episodes. Tokens are counted by the Competition Organizer-designated tokenizer.

    • Earlier final-submission timestamp — if still tied.

Note: These tie-breakers are applied only to the final official evaluation and final ranking. Live leaderboard positions are provisional. Competition Organizer may display provisional tie-break information on the live leaderboard if teams provide the required logs, but final winners are determined using verified logs submitted with the final official Submission.

2.3 Team Limits

  1. Maximum team size: 5 members.

2.4 Submission Limits

  1. Each team may make up to 5 Submissions per 24‑hour period.

  2. Teams may designate up to 1 Final Submission for judging before the Final Submission Deadline.

2.5 Competition Timeline (tentative)

   
Milestone Date
Competition launch 7 Nov 2025
Team Registration Deadline

& Submission Open
21 Nov 2025
Final Submission Deadline 25 Jan 2026
Offline evaluation complete 7 Feb 2026
Winner announcement 7 Feb 2026

*Dates are approximate and subject to change.

2.6 Submission Requirements

a) Model Requirements
  1. Parameter Limit (Track 1 only): Maximum 8 billion total parameters — including all active and frozen weights, embeddings, adapters, and LoRA modules. For Mixture-of-Experts models, “total parameters” means all experts (total), not only the subset activated per token or call.

  2. Release Date: Models must have been released before 1 Nov 2025.

  3. Recommended Models:

    1. Qwen3-<8B

    2. LLaMA-3.1-<7B

    3. Minitron-<8B

    4. or any model meeting the parameter and release date requirements.

  4. Fine-tuning. Fine-tuning is permitted only with publicly available datasets or team-created datasets (see §3.4 External Data & Tools for license and usage requirements).

    1. If a model is fine-tuned, the associated dataset must be submitted or made available for verification.
    2. If sharing is not legally permissible (IP/confidentiality/privacy), provide detailed documentation of the dataset and its provenance and contact the Competition Organizer in advance to arrange an alternative verification procedure.
b) Evaluation Phases

The Competition consists of two evaluation phases:

Phase 1 — Live Evaluation (Leaderboard Phase)

  • Occurs throughout the competition period.

  • Submissions are tested using lightweight evaluation scripts that make live LLM calls and communicate with host game servers via MCP (Machine Control Protocol).

  • Real-time leaderboards will be updated based on Phase 1 results.

  • Evaluation details and latency/throughput constraints will be announced before Submissions open.

Phase 2 — Final Evaluation (Reproduction Phase)

  • Conducted after the Final Submission Deadline.

  • Teams must submit all necessary components to reproduce their previous leaderboard results, including:

    • Model weights

    • Custom agent logic and prompt configurations

    • Inference and integration code

    • Documentation (2-page PDF) detailing architecture, training (if applicable), and reproducibility steps.

  • Winners will be determined based on Phase 2 reproducible results.

  • Competition Organizer reserves the right to disqualify any suspicious or malicious Submissions, including those that cannot be faithfully reproduced.

c) Submission Package

All teams must submit a complete Submission package. Submissions are capped at 5 per team per day, with one final selection required before the final deadline. The final Submission package below is used for official verification and (awarding of) prize decisions.

Required package contents

  • Model artifacts / LLM weights & documentation: model weights (eligible ≤8B weights where applicable), finetuning data, provenance, high-level training data, license/usage notes, and run instructions. Note proprietary/closed weights and provide provider/version/specs.

  • Agent code: runnable Python agent(s) compatible with the five games and the Competition Organizer’s Orak framework; include run instructions for local and Competition Organizer environments.

  • Design & training doc: 2-page PDF summarizing architecture, training, data (high level), and key implementation details.

  • Reproducibility artifacts: scripts, deps, seeds, Dockerfile/YAML or equivalent.

  • Submission meta: team name, contacts, members, and a short README.

Final-Submission artifacts (required for tie-breaking & verification)

  • Model declaration: Provide a Model declaration for each distinct model used during the final official evaluation: name, version, provider, and total parameter count. Also indicate whether models share identical base weights (same checksum). When multiple variants share identical base weights, count the base weights once toward ATP and add adapter/LoRA parameters separately. If exact counts are unavailable (closed provider), declare the Competition Organizer-defined tier; ATP will use that tier’s parameter value.

  • Evaluation summary (JSON/CSV): at minimum total_inference_calls, total_tokens or raw requests, evaluation_episodes, mean_calls_per_episode, mean_tokens_per_episode.

  • Raw requests / re-tokenizable text: either per-call raw request texts (JSONL/ZIP) for Competition Organizer re-tokenization or per-call token counts using the Competition Organizer’s tokenizer. Include README if redacted/encoded.

  • Optional (recommended): per-episode breakdown (episode_id, game_name, seed, inference_calls, tokens, final_score).

Formats & verification

  • Preferred formats: JSON / JSONL / CSV. Tokenizer: Competition Organizer-designated (default cl100k_base) for re-tokenization. Means reported to two decimal places; comparisons use rounded values. Competition Organizer may recompute parameters, calls, and tokens from official logs; Competition Organizer values are authoritative. Failure to provide final artifacts may forfeit tie-breaking consideration or cause disqualification.
  • PII & sensitive content: Remove or obfuscate any sensitive data or PII in raw request texts. Document any redactions in the README. If redaction prevents verification, contact organizers in advance to agree an alternate verification process.
  • Verification: The Competition Organizer may re-tokenize raw requests (using the Competition Organizer-designated tokenizer noted above) and may recompute inference-call counts and parameter counts from official logs. Failure to provide required artifacts or to enable verification may forfeit tie-breaking consideration or result in disqualification.

evaluation_episodes

  • The number of completed evaluation runs (reset→terminal or Competition Organizer max_steps) across all games/seeds/scenarios used to compute the official score. Abort/crash runs not used for scoring must be reported separately.

Tie-breaking & live leaderboard

  • Tie-break criteria: 1) lower model complexity (parameters), 2) lower mean LLM inference calls/episode, 3) shorter mean prompt length/tokens/episode. See this section for required artifacts. Tie-breakers are applied only to the final official evaluation; live leaderboard positions are provisional (Competition Organizer may display provisional tie-break info if logs are provided).

Misc

  • For large files/containers include checksums and upload instructions. Annotate items with licensing/confidentiality issues and contact Competition Organizer for guidance.

2.7 Evaluation Time and Step Limits

  1. Per-run wall-clock limit. For each evaluation run the organiser will allocate a maximum 12 hours (wall-clock) per vCPU. If a run does not complete within 12 hours the run will be terminated and the run’s score will be computed from the best performance achieved up to the timeout. Exception: for StarCraft II (SC2), if an episode does not complete within 12 hours the run will be scored as 0.
  2. Number of runs (seeds). Each game is evaluated using fixed seeds. The Official Rules specify 5 independent runs per game. For operational reasons Phase-1 may be executed with 3 runs; any operational change will be announced on the challenge page prior to evaluation.
  3. Game-specific max_steps. Each evaluation run is subject to a per-game step cap. When max_steps is reached the run terminates and the run’s metric is computed using the final environment state. The max_steps values for Phase-1 are as follows:
Game max_steps (per run) Run wall-clock limit Notes
Street Fighter III (SF3) 1000 12 hours
Super Mario (SM) 100 12 hours
Pokémon Red (Pkmon) 200 12 hours Metric reduced to 7 storyline flags (from 12) for competition scoring
StarCraft II (SC2) 1000 12 hours If not completed within 12 hours the run is scored 0
2048 1000 12 hours
  1. Token / step budgets and fairness. Token/step budgets follow the Orak benchmark defaults. To ensure fairness the organisers will use fixed seeds, record per-step logs, and apply anti-cheating controls (e.g., detection of scripted trajectories, hidden side-channels, and replay/log verification). Organisers may pause inference for fairness checks and will re-run top entries if required for verification.

Placement & operational note. Insert this subsection as 2.7 under 2. Competition-Specific Rules (immediately after 2.6). Keep the Official Rules’ authoritative values (e.g., 5 runs) public and document any operational deviations (e.g., Phase-1 runs = 3) in a separate Operations Playbook that is referenced from the Rules.


3. Winner Prize & License

3.1 Eligibility & Prize Requirements

General Participation: Employees, interns, or contractors of the Competition Organizer may participate but are ineligible for prizes. All Participants are responsible for compliance with their employer's internal policies.

By participating in the Competition, each Participant allows Competition Organizer to use their name, photograph, likeness, voice, opinions, information, biographical information, hometown and jurisdiction of residence for publicity and promotional purposes without further compensation where permitted by law.

Prize Eligibility: For a team to be eligible to win, each member must satisfy all of the following conditions:

  • Be at least 18 years old and at least the age of majority in their place of residence.

  • Not be (an employee, intern, or contractor of) Competition Organizer nor an immediate family member of an (employee, intern, or contractor of) Competition Organizer.

  • Not reside in any jurisdiction that is subject to export-control, sanctions, or other trade-restriction measures under Applicable Laws. Such measures include, without limitation, those administered by authorities in the Republic of Korea, the United States (including the Office of Foreign Assets Control (OFAC)), the European Union, the United Kingdom, and the United Nations. See §4 (General Provisions) for further/additional details.

Prize Receipt: To receive any prizes, Competition winners must comply with these Rules, including but not limited to providing detailed documentation of their solution and may be required to present their approach at a designated venue.

3.2 Winner Obligations

Prize winners must, within 14 days of notification:

  1. Execute and effectuate the license grant set forth in § 3.3.

  2. Complete and return all required tax and eligibility forms and any (further) information and documents needed in Competition Organizer’s sole discretion to pay the prize to the prize winner. Taxes and any other costs associated with prize acceptance and use, if any and as required by law, are the sole responsibility of the winner.

a) Prize claim & timeline

Only prizes claimed in accordance with these Rules will be awarded. Prize distribution may take up to six months from completion of the Competition.

Team structure and conduct Each Team MUST designate one person as the team leader who will be solely responsible for receiving communications from and communicating with the Competition Organizer. Teams MUST NOT generate an Submission in violation of these Rules or the Competition Organizer's Privacy Policy, MUST NOT engage in false, fraudulent, or deceptive acts at any phase, and MUST NOT tamper with or abuse any aspect of the Competition. Each Team MUST obtain all consents, approvals, or licenses required for submission of its Submission, including any intellectual property and data rights. Each Team MUST obtain necessary consents from all of such Team’s Participants regarding the collection and sharing of their personal information as outlined in these Rules.

Team membership and prize handling A Team may consist of one or more Participants. Each Participant may only be a member of one Team. Any Participant found to be part of more than one Team will be disqualified, and all associated Teams and Participants in those Teams will also be disqualified. Any prize awarded to a Team will be disbursed to the Team Leader on behalf of the Team. AICrowd and Competition Organizer will not be involved in, nor responsible for, allocation or distribution of the prize among Team members. Allocation among Team members is the sole responsibility of the Team and its Team Leader.

Additional Sections Submission requirements - A Submission must comply with all requirements set out in these Rules, including eligibility, format, originality, and content standards. Authoritative problem statement. " Competition Problem Statement" means the specific task or set of tasks described at the official Competition page, which a Team must address in its Submission.

3.3 Winner License

  1. You grant a worldwide, perpetual, non‑exclusive, transferable, sublicensable, assignable, irrevocable, royalty‑free license to the Competition Organizer to use, reproduce, distribute, create derivative works of, publicly display and perform, and otherwise exploit your prize-winning Submission and its source code for any purpose.

  2. Open‑sourcing is encouraged. You may release your code under any OSI‑approved license, provided it does not restrict the Competition Organizer’s rights described above.

3.4 External Data & Tools

External data, pretrained models, and automated ML tools are permitted provided they are publicly available to all Participants at minimal or no cost. Participants are responsible for resolving all license and usage issues. When external or reformulated data is used, teams may be requested to make such resources publicly available after the Competition; if sharing is not legally permissible, provide detailed documentation and coordinate an alternate verification process in advance.

3.5 Competition Organizer Authority & Rule Changes

Rule Modifications: The Competition Organizer may change these Rules, the timeline, evaluation metrics, prize structure, or any other aspect of the Competition at any time, without prior notice, before the Competition ends. This includes the right to modify game rules during the Competition to ensure fair play, balance, or maintain event integrity.

Disqualification Powers: Competition Organizer reserves the right to disqualify Participants or Teams at any point—before, during, or after the Competition—if cheating, plagiarism, collusion, or other misconduct is discovered or suspected. Like retractions from academic journals, disqualification may occur even after results are announced if violations are later discovered.

Final Authority: All Competition Organizer decisions are final and binding. We want you to succeed—please reach out to us if you are uncertain about whether your current approach violates these Rules.

3.6 Governing Law

Unless superseded by local law, these Rules are governed by the laws of the Republic of Korea, and any actions will be brought exclusively in the courts of Seoul, Korea.


4. General Provisions

Your participation in the Competition, your Submission (including all components thereof), and your eligibility for prize payments are subject to compliance with all rules and rulings and directions by Competition Organizer (including, but not limited to, these Rules), as well as all applicable laws, including without limitation, sanctions regimes, export-control, applicable anti-money-laundering/know-your-customer (AML/KYC) requirements, and banking or payment-network restrictions, including without limitation those administered by authorities in Korea, the United States (including OFAC), the European Union, the United Kingdom, and the United Nations, that apply to you, your Submission (including all components thereof), the Competition, the Competition Organizer or the Competition Sponsors (collectively, “Applicable Laws”). The Competition Organizer may withhold, delay, cancel, or require an alternative method of prize delivery if (a) making payment would violate or be restricted by such laws or rules, (b) the recipient does not pass required AML/KYC checks or fails to provide requested information or certifications in a timely manner, or (c) payment is not reasonably feasible due to banking or remittance restrictions. The Competition Organizer may, at its discretion, offer a lawful alternative form of consideration (e.g., equivalent value via a permitted channel). If a reasonable, lawful/payment-network-permitted method is not reasonably available within a reasonable period, in the sole discretion of the Competition Organizer, the prize may be deemed forfeited.

You represent, warrant and covenant:

  1. you are eligible to participate in the Competition,

  2. your participation in the Competition, your Submission (including all components thereof), does and shall at all times comply with Applicable Laws,

  3. your Submission (including all components thereof) does not and shall not violate any rights of any third parties, including without limitation not infringing any property rights.

  4. you own all rights to the Submission and your Submission constitutes your original works of authorship, and your Submission does not contain information considered by you or any other third party to be confidential.  If your Submission contains any material or elements that are not owned by you and/or which are subject to the rights of third parties, you have obtained, prior to submission, any and all releases and consents necessary to permit use and exploitation of the Submission by the Competition Sponsors and Competition Organizer in the manner set forth in the Rules without additional compensation.

You agree to indemnify, pay the defense costs of, and hold the Released Parties harmless from any and all claims, demands, costs, liabilities, losses, expenses and damages (including attorneys’ fees, costs (including litigation costs and costs incurred in the settlement or avoidance of any such claim), and expert witnesses’ fees) arising out of or in connection with (i) any and all aspects of the Competition and your participation therein, including your Submission (and all components thereof), (ii) any breach by you of any Applicable Laws or provisions of the Rules, including a breach by you of representations, warranties, covenants, responsibilities, or obligations set forth herein, including without limitation any failure to obtain any third party consents and releases, (iii) the Submission(s) violating or infringing the intellectual property rights, privacy rights, rights of publicity, or other rights of any third party, (iv) any federal, state, or foreign civil or criminal actions relating to the Submission, and (v) any damage to property, personal injury, illness or death, occurring in connection with the Competitions. 

Disclaimer, Release and Limit of Liability. COMPETITION ORGANIZER MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, REGARDING ANY PRIZE OR YOUR PARTICIPATION IN THE COMPETITION. BY ENTERING THE COMPETITION OR RECEIPT OF ANY PRIZE, EACH PARTICIPANT AGREES TO RELEASE AND HOLD HARMLESS COMPETITION ORGANIZER AND THE COMPETITION SPONSORS AND THEIR RESPECTIVE SUBSIDIARIES, AFFILIATES, SUPPLIERS, DISTRIBUTORS, ADVERTISING/EVENT AGENCIES, EVENT PLATFORM PROVIDERS, PRIZE PROVIDERS, AND EACH OF THEIR RESPECTIVE PARENT COMPANIES AND EACH SUCH COMPANY’S OFFICERS, DIRECTORS, EMPLOYEES AND AGENTS (COLLECTIVELY, THE “RELEASED PARTIES”) FROM AND AGAINST ANY CLAIM OR CAUSE OF ACTION, INCLUDING, BUT NOT LIMITED TO, PERSONAL INJURY, DEATH, OR DAMAGE TO OR LOSS OF PROPERTY, ARISING OUT OF PARTICIPATION IN THE COMPETITION OR RECEIPT OR USE OR MISUSE OF ANY PRIZE. THE RELEASED PARTIES ARE NOT RESPONSIBLE FOR: (1) ANY INCORRECT OR INACCURATE INFORMATION, WHETHER CAUSED BY PARTICIPANTS, PRINTING ERRORS OR BY ANY OF THE EQUIPMENT OR PROGRAMMING ASSOCIATED WITH OR UTILIZED IN THE COMPETITION; (2) TECHNICAL FAILURES OF ANY KIND, INCLUDING, BUT NOT LIMITED TO MALFUNCTIONS, INTERRUPTIONS, OR DISCONNECTIONS IN PHONE LINES OR NETWORK HARDWARE OR SOFTWARE; (3) UNAUTHORIZED HUMAN INTERVENTION IN ANY PART OF THE ENTRY PROCESS OR THE COMPETITION; (4) TECHNICAL OR HUMAN ERROR WHICH MAY OCCUR IN THE ADMINISTRATION OF THE COMPETITION OR THE PROCESSING OF ENTRIES; OR (5) ANY INJURY OR DAMAGE TO PERSONS, DATA, OR PROPERTY WHICH MAY BE CAUSED, DIRECTLY OR INDIRECTLY, IN WHOLE OR IN PART, FROM PARTICIPANT’S PARTICIPATION IN THE COMPETIITON OR RECEIPT OR USE OR MISUSE OF ANY PRIZE. IN NO EVENT SHALL ANY RELEASED PARTY BE LIABLE FOR LOSS OF PROFITS, OR ANY SPECIAL, PUNITIVE, EXEMPLARY, INCIDENTAL, INDIRECT OR CONSEQUENTIAL DAMAGES ARISING OUT OF, RELATING TO OR IN CONNECTION WITH THE COMPETITION, THE USE OF (OR INABILITY TO USE OR DELAY IN USE OF) ANY MATERIALS PROVIDED BY A RELEASED PARTY, THE FUNCTIONALITY (OR LACK OF FUNCTIONALITY) OF ANY SUCH MATERIALS, OR ERRORS OR BUGS WITHIN SUCH MATERIALS, WHETHER UNDER THEORY OF CONTRACT, TORT (INCLUDING NEGLIGENCE), INDEMNITY, PRODUCT LIABILITY, OR OTHERWISE.  IN NO EVENT SHALL THE RELEASED PARTIES’ COLLECTIVE LIABILITY ARISING UNDER, RELATING TO OR IN CONNECTION WITH THIS AGREEMENT, EXCEED $100 INCLUDING ANY LIABILITY FOR DIRECT OR INDIRECT DAMAGES, LOSSES, OR INJURIES.  THE LIMITATIONS OF LIABILITY SET FORTH IN THIS SECTION SHALL APPLY TO THE FULLEST EXTENT PERMISSIBLE AT LAW.  NO RELEASED PARTY, SHALL BEAR ANY RISK, OR HAVE ANY RESPONSIBILITY OR LIABILITY, OF ANY KIND, TO LICENSEE OR TO ANY THIRD PARTIES WITH RESPECT TO THE QUALITY (OR LACK THEREOF), OPERATION (OR LACK THEREOF), OR PERFORMANCE (OR LACK THEREOF) OF ALL OR ANY PORTION OF MATERIALS PROVIDED BY RELEASED PARTIES. 

Intellectual Property. Notwithstanding anything to the contrary in these Rules, except as expressly provided herein, no license, permission, or right is granted to use the intellectual property of the Competition Organizer or Competition Sponsors for any purpose.

Personal Information. Participants may be required to provide personal information during the course of the Competition, such as their name, email and bank details. This information may be used to promote and administer the Competition, contact winners, deliver prizes, and as otherwise specified in Competition Organizer’s privacy policy available at https://accounts.krafton.com/privacy-policy. Without limiting the rights of Competition Organizer under its privacy policies available here with regard to its use of personal information, by participating in the Competition Participants are consenting to the collection, use and processing of their personal information as set forth in these Rules and the privacy policy mentioned above, as well as any further processing of personal information willingly provided to Competition Organizer during a Participant’s participation in the Competition.

Miscellaneous. Competition Organizer reserves the right to change these Rules at any time, in its sole discretion. These Rules constitute the entire agreement between you and the Competition Organizer regarding the Competition and supersede any prior understandings. If any provision is held invalid, the remainder will continue in full force and effect.