Activity
Challenge Categories
Challenges Entered
Build an LLM agent for five real-world games
Latest submissions
See All| graded | 304593 | ||
| failed | 304578 | ||
| submitted | 304577 |
Create Context-Aware, Dynamic, and Immersive In-Game Dialogue
Latest submissions
See All| failed | 285403 | ||
| failed | 285312 | ||
| failed | 283221 |
Improve RAG with Real-World Benchmarks | KDD Cup 2025
Latest submissions
See All| failed | 292367 | ||
| failed | 292325 | ||
| graded | 286048 |
A benchmark for image-based food recognition
Latest submissions
See All| failed | 172430 | ||
| graded | 172229 | ||
| failed | 172228 |
Using AI For Buildingโs Energy Management
Latest submissions
See All| failed | 193327 | ||
| failed | 193315 | ||
| failed | 193310 |
What data should you label to get the most value for your money?
Latest submissions
See All| failed | 178246 | ||
| failed | 177490 | ||
| failed | 177425 |
Behavioral Representation Learning from Animal Poses.
Latest submissions
Airborne Object Tracking Challenge
Latest submissions
ASCII-rendered single-player dungeon crawl game
Latest submissions
See All| graded | 155140 | ||
| graded | 147319 |
Latest submissions
Machine Learning for detection of early onset of Alzheimers
Latest submissions
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
Sample Efficient Reinforcement Learning in Minecraft
Latest submissions
The first, open autonomous racing challenge.
Latest submissions
See All| graded | 176785 | ||
| graded | 176487 | ||
| graded | 176466 |
Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments
Latest submissions
See All| submitted | 90059 | ||
| graded | 83575 | ||
| failed | 81249 |
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
Self-driving RL on DeepRacer cars - From simulation to real world
Latest submissions
Robustness and teamwork in a massively multiagent environment
Latest submissions
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
Latest submissions
Play in a realistic insurance market, compete for profit!
Latest submissions
See All| graded | 125874 | ||
| graded | 121934 | ||
| failed | 116909 |
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
Multi-Agent Reinforcement Learning on Trains
Latest submissions
A dataset and open-ended challenge for music recommendation research
Latest submissions
See All| failed | 303444 |
A benchmark for image-based food recognition
Latest submissions
See All| graded | 114994 | ||
| graded | 114972 | ||
| failed | 114971 |
Latest submissions
Sample-efficient reinforcement learning in Minecraft
Latest submissions
Latest submissions
See All| failed | 124981 | ||
| failed | 124727 | ||
| failed | 124726 |
5 Puzzles, 3 Weeks. Can you solve them all? ๐
Latest submissions
Multi-agent RL in game environment. Train your Derklings, creatures with a neural network brain, to fight for you!
Latest submissions
Predicting smell of molecular compounds
Latest submissions
Classify images of snake species from around the world
Latest submissions
Find all the aircraft!
Latest submissions
5 Problems 21 Days. Can you solve it all?
Latest submissions
5 Puzzles 21 Days. Can you solve it all?
Latest submissions
5 Puzzles, 3 Weeks | Can you solve them all?
Latest submissions
5 PROBLEMS 3 WEEKS. CAN YOU SOLVE THEM ALL?
Latest submissions
Grouping/Sorting players into their respective teams
Latest submissions
Latest submissions
Sample-efficient reinforcement learning in Minecraft
Latest submissions
Multi Agent Reinforcement Learning on Trains.
Latest submissions
Latest submissions
See All| graded | 191633 | ||
| submitted | 191628 | ||
| submitted | 191622 |
Latest submissions
See All| graded | 60315 | ||
| graded | 60314 |
Latest submissions
5 Problems 15 Days. Can you solve it all?
Latest submissions
Project 2: Road extraction from satellite images
Latest submissions
Project 2: build our own text classifier system, and test its performance.
Latest submissions
Predict if users will skip or listen to the music they're streamed
Latest submissions
Identifying relevant concepts in a large corpus of medical images
Latest submissions
Latest submissions
Latest submissions
See All| graded | 67702 | ||
| graded | 67701 | ||
| graded | 67600 |
Latest submissions
Predict if users will skip or listen to the music they're streamed
Latest submissions
Latest submissions
Latest submissions
Predicting wine quality
Latest submissions
Predict whether an individual will be back to prison
Latest submissions
Latest submissions
Analyse Sentiment From Sound Clips
Latest submissions
Reinforcement Learning, IIT-M, assignment 1
Latest submissions
5 puzzles and 1 week to solve them!
Latest submissions
Latest submissions
Multi-Agent Reinforcement Learning on Trains
Latest submissions
Latest submissions
Localization, SLAM, Place Recognition, Visual Navigation, Loop Closure Detection
Latest submissions
Identify Words from silent video inputs.
Latest submissions
A Challenge on Continual Learning using Real-World Imagery
Latest submissions
Use an RL agent to build a structure with natural language inputs
Latest submissions
Latest submissions
See All| graded | 281041 | ||
| failed | 281038 | ||
| graded | 281037 |
Generating answers using image-linked data
Latest submissions
See All| failed | 292367 | ||
| failed | 292325 | ||
| graded | 286048 |
Synthesising answers from image and web sources
Latest submissions
See All| graded | 283744 | ||
| graded | 282957 | ||
| graded | 282706 |
Contextual answering in multi-turn dialogue
Latest submissions
See All| graded | 282958 | ||
| graded | 282199 | ||
| failed | 282198 |
| Participant | Rating |
|---|---|
BhaviD
|
0 |
will_kwan
|
0 |
lars12llt
|
0 |
jansi_rani_s_v
|
0 |
|
|
0 |
saketha_ramanujam
|
0 |
vrv
|
0 |
jerome_patel
|
0 |
shivam
|
136 |
cadabullos
|
0 |
krishna_kaushik
|
0 |
unnikrishnan.r
|
261 |
| Participant | Rating |
|---|---|
vrv
|
0 |
aicrowd-bot
|
|
shivam
|
136 |
unnikrishnan.r
|
261 |
Orak Game Agent Challenge
Model submission
2 days agoWhile running remote mode, it seems like the Star Craft failed after entering the second episode, is it possible that thereโs a problem with the code in the MCP server or game envs of AIcrowd?
The issue was due to requests getting queued and dequeued arbitrarily at MCP server. We moved to a gRPC based implementation to get around this issue.
Please pull the recent changes made to starter kit and let us know if you still run into any issues.
long will the evaluation process usually take?
Itโs subjective. However, if you use the random agents (not really random, they simply repeat the same static action), this is what you should expect.
Do score only appear after all games are completed?
Yes. Your submission must complete all games to be marked as graded. Your submission wonโt appear on leaderboard otherwise.
will it directly show the score on the submission page once all games are completed or it will take some additional time to make the score shown on the submission page
Your submission would be marked as failed if it doesnโt complete all the games. In case your evaluation times out i.e. doesnโt finish all the games in 12 hours, the submission would eventually get marked as failed.
Clarification on ORAK Scoring Standards and Remote Mode Episode Settings
2 days agoWhen running the games in remote mode, how many episodes are executed for each game?
Three episodes each for all games.
And for games like 2048, is the final score taken as the average of three rounds
Final score for each game would be the average scores across episodes.
Super_mario
2 days agoHey, we released a few patches to the starter kit that would remove fastmcp dependencies. Can you please pull the recent changes, give it a try and reach out to us if you are still running into problems?
๐ข Starter Kit Update
2 days ago
๐ข Starter Kit Update
2 days agoHello everyone! A quick heads up about an important stability update to the starter kit.
Weโve migrated the transport/communication backend (previously based on MCP/FastMCP) to gRPC to make the interaction between your agent and the game environment more robust and predictable .
What changed?
- The underlying transport layer is now gRPC-based .
- The agent-facing interface and APIs are unchanged . Your existing agent code should continue to work as is.
- The main goal of this change is better resilience under long games and reconnections .
Why this matters?
Some of you were seeing:
- Random stalls / timeouts during longer runs
- Reconnection issues
- Episodes hanging with no clear error
These issues were caused by how requests were queued and retried in the previous MCP-based setup. With gRPC, we now strictly enforce โone client, one action in flightโ , and we get clearer error handling, which should eliminate these stalls and make reconnect behavior much more reliable.
What you need to do
-
Update your local starter kit
- Pull the latest changes from the repo (e.g.,
git pull --rebase).
- Pull the latest changes from the repo (e.g.,
-
Reinstall/refresh dependencies if needed
uv sync
-
Run your existing agents as usual
- No changes should be required to your agent logic or environment interaction code.
If you still see issues
If you run into:
- Timeouts
- Stalls
- Reconnection problems
please share:
- Logs (client + server, if possible)
- Approximate episode length and map
- Steps to reproduce
This will help us quickly track down any remaining edge cases.
Thanks for your patience while we tracked this down.
StarCraft Stuck After Episode 1 (โClient is not connectedโ Issue)
5 days agoWe use the Linux headless binary for evaluations, and the steps below should help you get everything set up clearly:
- Download the last available version (4.10) from GitHub - Blizzard/s2client-proto: StarCraft II Client - protocol definitions used to communicate with StarCraft II..
- Check the README for instructions on unzipping the archive. Itโs password protected, and the password is provided in the README itself.
- Extract the game into your
$HOMEdirectory. After extraction, the expected path should be$HOME/StarCraftII. - Depending on your
burnysc2version, the maps directory may need to be either$HOME/StarCraftII/Mapsor$HOME/StarCraftII/maps. To avoid issues with missing map directories, create a symlink:ln -s $HOME/StarCraftII/Maps $HOME/StarCraftII/maps - For additional details on configuring the maps, see: Question about SC2 map setting in starter kit
Model submission
5 days ago- The final evaluations would be manually run by Krafton AI team and the prizes would be decided based on the outcome of the manual runs. Krafton AI team would verify the size of the model you submit during the final evaluations.
- No, we do not require any sort of access to your LLM server. However, for the final evaluations, you would need to include precise instructions and code that is needed to start your LLM server and ensure that Krafton AI team is able to run everything end-to-end.
Model submission
5 days agoNo, it doesnโt need to be hosted externally. We simply provide the endpoints that let you get game observations, and your agent only needs to return the actions to execute in the game. How you produce those actions whether through local models, external services, or any other setup is entirely up to you.
This means your machine must be able to access whichever models or services you rely on during inference.
In Local Mode, everything (your runner, game launcher, agents, and the MCP game servers) runs directly on your machine. The runner starts the game servers, initializes your agents, and your agents communicate with the servers over localhost. There is no connection to the AIcrowd backend.
In Remote Mode, your runner and agents (including the LLM calls you make etc.,) still run locally, but the MCP game servers run on AIcrowdโs remote infrastructure. Your runner creates a session via the Session API, receives the MCP server URLs, and your agents interact with the remote servers over HTTPS.
- Local Mode hosts the entire stack on your machine.
- Remote Mode keeps your agents local while offloading the game servers and environments to AIcrowd.
Hope this makes it clear
ModuleNotFoundError: No module named 'omegaconf'
6 days agouv looks for pyproject.toml and automatically manages the virtual environment at the repo level.
Can you verify that your environment is actually being used?
# check which Python uv is running
uv run python
# try importing a starter-kitโspecific library, for example:
import sc2
If this import works, then uv is configured correctly.
You can install additional packages with:
uv add <package>
Although uv pip install -r pyproject.toml works, itโs generally better to use:
uv sync
This installs the exact dependency versions listed in uv.lock, matching the environment used during starter-kit testing.
Model submission
6 days agoYou donโt need to submit your model directly for this challenge. When you make a submission, we automatically launch an instance of each game and provide your agent with a unique MCP address for that run. Each game reports its score back to us, and we update the leaderboard accordingly.
Question about SC2 map setting in starter kit
7 days agoYou can search for the map names directly and you should be able to find them.
Maps ending with * AIE and * LIE are patched versions. They allow the latest maps to work with the last StarCraft II binary released for Linux. You can find these * AIE maps here: Maps - AI Arena Wiki
The map Ancient Cistern LE can be found here: Large-Language-Models-play-StarCraftII/Maps at main ยท histmeisah/Large-Language-Models-play-StarCraftII ยท GitHub
If you are using the latest StarCraft II version (and not the Linux headless binary), you do not need to patch the maps. You should be able to run the games as-is.
Just make sure youโve upgraded both burnysc2 and s2clientprotocol to their latest versions.
StarCraft Stuck After Episode 1 (โClient is not connectedโ Issue)
7 days agoWe are investigating this issue and will get back as soon as we have an update
Question about SC2 map setting in starter kit
14 days agoAncient Cistern LIE is a ported version of Ancient Cistern LE. Since our evaluation environment uses the Linux headless SC2 binary, and the latest available release is still v4.10, it cannot load the newer official maps. To address this, we ported Ancient Cistern LE ourselves using a community-provided patch: GitHub - aiarena/sc2patch.
Flextrack Challenge 2025
๐ฌ Feedback & Suggestions
4 months ago@imeintanis Using AIcrowd | Flextrack Challenge 2025 | Submissions (https://www.aicrowd.com/challenges/flextrack-challenge-2025/submissions?my_submissions=true_ should display the list of all submissions you made to the challenge.
You can get to this page by using the submission filters option.
๐ฌ Feedback & Suggestions
4 months agoHello @jack_vandyke
Thanks for bringing this up. A minor misconfigured on our end resulted in this. All the pending evaluations are graded now. You should get your evaluation results almost instantly without any wait time.
Commonsense Persona-Grounded Dialogue Chall-0431ae
Issues regarding CPDC 2025 rerun
5 months agoHello @yiyang_zheng
We were able to run your submission fine. It was indeed an issue with git LFS files.
Meta CRAG - MM Challenge 2025
Request for investigation into the cause of the error
6 months agoHello @NineGates
We are re-evaluating 289818 (seems like some network error caused it to fail with the error โfailed to load artifactsโ) and 289826 (evaluation didnโt start).
We canโt re-evaluate the rest as they failed due to timeout errors. Please note that itโs not the average prediction time we look at it, there is hard constraint on per step timeout. One step that violates this can fail the submission.
- 289762 violated the overall timeout.
- 289784 failed due to a per step timeout
2025-06-18 08:18:43.290 INFO main:run_with_timeout:154 - Running batch_generate_response with timeout 80
2025-06-18 08:20:28.554 INFO main:run_with_timeout:164 - Executed batch_generate_response in 105.26403450965881 seconds
- 289820 failed due to a per step timeout
2025-06-18 07:06:50.363 INFO main:run_with_timeout:154 - Running batch_generate_response with timeout 80
2025-06-18 07:08:14.969 INFO main:run_with_timeout:164 - Executed batch_generate_response in 84.60583829879761 seconds
Submission Status Change From "Generating" to "Prepare Generate" and Stuck
6 months agoIn case your submission emits logs that can further help, let us know. We will clean up any dataset related log lines and share it with you.
Notebooks
-
Solution for submission 128368 A detailed solution for submission 128368 submitted for challenge IIT-M RL-ASSIGNMENT-2-TAXIjyotishยท Over 4 years ago -
[Baseline] Detectron2 starter kit for food recognition ๐ A beginner friendly notebook kick start your instance segmentation skills with detectron2jyotishยท Almost 5 years ago





Pokemon map is broken?
Yesterday@ilya_gusev is there a particular submission you are referring to that I can check?
We didnโt change anything specific to pokemon and wouldnโt expect this change. Maybe the game hasnโt started yet? The game ROM we received from Krafton team starts at the menu screen and the โMap on Screenโ wouldnโt be defined for that screen.
Is there something I can cross check on the server logs or can you give the exact steps to replicate this issue so that we can pass this to Krafton team?