Loading
Round 1: Completed Round 2: 34 hours left Β· Ending 31 Jan 23:55 UTC
19k
403
58
2994

 

πŸ“ΉWatch the recording for Live Workshop here. 

🎁Global Chess Challenge Starter Kit

β™ŸοΈ Global Chess Challenge 2025

The Global Chess Challenge uses chess as a clean, rigorous testbed for studying reasoning in language models. Classical chess engines like Stockfish reach superhuman strength through heuristics, deep search and precise calculation. Large language models, by contrast, operate very differently and often struggle with basic issues such as move legality, tactical consistency, or planning several moves ahead. Rather than viewing this gap as a weakness, this challenge treats it as an opportunity: to understand how structured reasoning can be learned, constrained, and evaluated inside language models.

This challenge frames chess as a text-only problem. Models receive a symbolic description of the position and must decide what to play without access to boards, search procedures, or external tools. Every position is fully observable, every move can be checked for legality, and move quality can be evaluated objectively using Stockfish. This makes chess unusually well suited for controlled experimentation: the rules are fixed, the state space is precise, and progress can be measured reliably.

For the chess community, it points toward a new kind of learning experience. Instead of outputting only the best move, models are required to explain their choice in simple, human-readable language. By articulating ideas, plans, and trade-offs, these models resemble a commentator or coach rather than a silent engine. This opens the door to more intuitive and conversational analysis tools built directly on top of players’ own games.

For the AI research community, it offers a transparent and reproducible environment for studying text-based reasoning under strict constraints. Participants can combine large public chess datasets with engine-based verification to explore a wide range of approaches, from supervised finetuning to reinforcement learning with verifiable rewards. All within a domain that is both rich and precisely defined. The result is a shared benchmark that connects human learning, language-based reasoning, and the enduring complexity of chess.

The Global Chess Challenge 2025 is a global hybrid competition organized by AGI House and sponsored by Amazon Web Services (AWS), with platform and leaderboard infrastructure provided by AIcrowd. The Challenge asks whether small language models can make strong, legal chess decisions from text-only inputs, under strict execution constraints, while also producing a short, human-readable explanation of their intent.

πŸ’» What is the Global Chess Challenge?

You will build a text-only chess agent that does two things for every position:

  1. Outputs exactly one legal move (in UCI format)
  2. Outputs a one-sentence rationale explaining the idea behind the move

All submissions are executed independently by the Organizers on controlled infrastructure. At inference time, your model must behave as a standalone language model: it must decide moves solely via token-level prediction conditioned on the provided text input.

βœ… Allowed: training with open datasets; offline preprocessing; finetuning; RL using verifiable rewards; using Stockfish during training to label / score data. \ ❌ Not allowed at inference: external tools, function calling, heuristic search procedures, retrieval systems, embedded chess engines, or any auxiliary decision system beyond the submitted model’s own forward pass.

The goal is to build reliable structured reasoning and strong play within a constrained, reproducible evaluation setting.

πŸ§ͺ Suggested Approaches

The following are representative approaches we encourage participants to explore as part of this Challenge. They are not rigid tracks, and teams are explicitly encouraged to try any other methods they believe fit within the Challenge constraints.

The only requirement is that the submitted artifact respects the inference-time constraints: at evaluation time, the model must operate as a standalone language model with no tools, search, or external systems.

1️⃣ Data-centric finetuning (SFT)

Train a model to map text positions to high-quality moves (and short explanations).

Possible ingredients:

  • Open chess corpora such as the Lichess Open Database / puzzle sets (as permitted by their licenses)
  • Tuples like: \ {FEN, side_to_move, legal_moves_uci, move_played, optional Stockfish labels}
  • Offline Stockfish annotations for training labels (best move / PV / eval) used during training only

2️⃣ RLVR (Reinforcement Learning with Verifiable Rewards)

Use Stockfish as a verifier during training to generate rewards (e.g., legality + evaluation improvement + top-K alignment), and optimize with RL methods such as PPO / GRPO.

Key point: the submitted model must still run without tools/search at inference time.

πŸ“₯ Submission Format

Participants submit a language model via a gated Hugging Face repository. Once a submission is accepted, the Organizers pull and run the model on their own controlled infrastructureβ€”participants never run code inside the evaluation environment.

Submissions interact with the provided environment purely through text, using standardized prompt templates. Teams may submit up to 20 entries per day.

At inference time:

  • The model is loaded from the submitted Hugging Face repo
  • Inputs are provided via a prompt template with predefined variables
  • The model must respond with a move and a short rationale, following a strict output format

Details of the prompt templates and available variables are documented in the starter kit: https://github.com/AIcrowd/global-chess-challenge-2025-starter-kit/tree/master/player_agents

🧩 Model input

For every turn, the agent can receive a prompt which uses multiple variables as described in the docs, including:

  • Position as a FEN string (e.g. r1bk3r/p2pBpNp/n4n2/1p1NP2P/6P1/3P4/P1P1K3/q5b1)
  • Side to move (White / Black)
  • List of legal moves in UCI format (e.g., e2e4, g1f3, e7e8q)

You do not need to implement chess rules or generate legal movesβ€”this is handled by the environment.

🧩 Model output

For each input position, the agent must return:

  • Exactly one UCI move, chosen from the provided legal move list, wrapped in:
    • <uci_move>...</uci_move>
  • A one-sentence rationale (required but not scored), typically wrapped in:
    • <rationale>...</rationale>

Important: Evaluation is based exclusively on the UCI move inside <uci_move> tags. Any text outside <uci_move> is ignored for scoring.

If your submission does not provide a valid UCI move in the correct tags (missing tags, malformed output, or illegal move), the evaluator will retry up to three (3) times for the same position. If it still fails, the model is treated as having resigned, and the game is recorded as a loss.

πŸ“Š Data & Environment

The Challenge provides a shared environment for standardized evaluation.

  • Uses [python-chess](https://github.com/AIcrowd/chess-env) for board representation, FEN/PGN, and legality checks
  • Uses local Stockfish for baseline opponents and for post-game analysis
  • Produces game logs (PGNs) and evaluation summaries

Participants can use the starter kit to run local games and verify output format, legality, and end-to-end execution.

πŸ“ˆ Evaluation & Metrics (What the leaderboard actually uses)

Evaluation proceeds in multiple stages.

βœ… Round 1 & Round 2: Baseline Evaluation (Leaderboard)

In both rounds, every submission is evaluated against fixed Stockfish opponents to create a stable, comparable baseline.

Each submission plays:

  • 50 games vs Stockfish Skill 0 (Depth 1)
  • 50 games vs Stockfish Skill 0 (Depth 5)

Primary leaderboard score: Average Centipawn Loss (ACPL)

  • ACPL is computed by analyzing played games using Stockfish Level 20 (Depth 20) as the reference evaluator.
  • Lower ACPL is better.

Secondary score: Win Rate

  • Win rate is computed across all baseline games and is used as a secondary metric (e.g., for tie-breaking and analysis).

βœ… Eligibility for the Final Tournament

At the end of Round 2, only submissions with ACPL lower than the official baseline model defined by the Organizers are considered eligible submissions and advance to the Final Tournament.

🏁 Final Tournament: Swiss-Style Competition (Determines winners)

Eligible submissions compete in a Swiss-system tournament.

  • Final ranking is based only on game outcomes:
    • Win = 1 point
    • Draw = 0.5 points
    • Loss = 0 points
  • ACPL is not used during the Swiss tournament for scoring or ranking.

The final winners and the prize allocations are decided based on the results of this tournament.

πŸ”’ Execution Constraints

All models are run in a controlled environment with standardized resources (a trn1.2xlarge instance) and no external network access.

At inference time, submissions:

  • Must not call external tools or APIs
  • Must not use function calling, retrieval, or heuristic search procedures
  • Must not embed or invoke chess engines or auxiliary decision systems
  • Must produce decisions solely through language model inference over the provided text prompt

πŸ“¦ Eligible Models, Backends, and Size Limit

Because evaluation runs on AWS Trainium with a specific runtime stack, only a subset of model families and execution backends are supported.

Participants must use only the architectures and backends documented here: \ https://github.com/AIcrowd/global-chess-challenge-2025-starter-kit/blob/master/docs/neuron-and-vllm-tuning.md#supported-model-types-backends

 

Model size restriction

Only models with a total parameter count of strictly fewer than 8,000,000,000 (8B) parameters are eligible for leaderboard ranking, Final Tournament qualification, and prizes.

Parameter count is determined from the model weights at inference time (excluding optimizer state).

βš™οΈ AWS Trainium

This Challenge runs on AWS Trainium, using the AWS Neuron software stack and supported model execution backends.

Resources to get started

πŸ† Prizes

Cash prize pool: USD 17,000 
Compute credits: USD 8,000

  • πŸ₯‡ First Place: USD 10,000 + USD 5,000 credits
  • πŸ₯ˆ Second Place: USD 5,000 + USD 2,000 credits
  • πŸ₯‰ Third Place: USD 2,000 + USD 1,000 credits

(Prize eligibility is subject to the Official Rules.)

πŸ“… Timeline

  • Launch & Registration Opens: December 4, 2025
  • Round 1 Submissions Close: December 31, 2025 (23:55 UTC)
  • Team Freeze Last Date: 15th January, 2025 (23:55 UTC)
  • Round 2 Submissions Close: January 31, 2026 (23:55 UTC)
  • Final Tournament: Feb 1, 2026 – Feb 7, 2026
  • Winners Announced: Feb 15, 2026

πŸ”‘ Starter Kit

Make your first submission using the starter kit: https://github.com/AIcrowd/global-chess-challenge-2025-starter-kit

It includes:

  • A ready-to-run environment
  • Example agents
  • A template submission
  • Documentation for supported model/backends on Trainium

Participants

Getting Started