Loading
12 Follower
4 Following
jyotish
Jyotish

Organization

AIcrowd

Location

IN

Badges

8
6
5

Connect

Activity

Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Mon
Wed
Fri

Challenge Categories

Loading...

Challenges Entered

Latest submissions

See All
graded 303841
graded 303840
graded 303785

Build an LLM agent for five real-world games

Latest submissions

See All
graded 304593
failed 304578
submitted 304577

Detecting Energy Flexibility in Buildings

Latest submissions

See All
graded 292399

Create Context-Aware, Dynamic, and Immersive In-Game Dialogue

Latest submissions

See All
failed 285403
failed 285312
failed 283221

Improve RAG with Real-World Benchmarks | KDD Cup 2025

Latest submissions

See All
failed 292367
failed 292325
graded 286048

A benchmark for image-based food recognition

Latest submissions

See All
failed 172430
graded 172229
failed 172228

Using AI For Buildingโ€™s Energy Management

Latest submissions

See All
failed 193327
failed 193315
failed 193310

What data should you label to get the most value for your money?

Latest submissions

See All
failed 178246
failed 177490
failed 177425

Latest submissions

See All
graded 192426
failed 192410
submitted 192407

Behavioral Representation Learning from Animal Poses.

Latest submissions

No submissions made in this challenge.

Airborne Object Tracking Challenge

Latest submissions

No submissions made in this challenge.

ASCII-rendered single-player dungeon crawl game

Latest submissions

See All
graded 155140
graded 147319

Machine Learning for detection of early onset of Alzheimers

Latest submissions

No submissions made in this challenge.

5 Puzzles 21 Days. Can you solve it all?

Latest submissions

No submissions made in this challenge.

Sample Efficient Reinforcement Learning in Minecraft

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
graded 176785
graded 176487
graded 176466

Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments

Latest submissions

See All
submitted 90059
graded 83575
failed 81249

5 Puzzles 21 Days. Can you solve it all?

Latest submissions

No submissions made in this challenge.

Self-driving RL on DeepRacer cars - From simulation to real world

Latest submissions

No submissions made in this challenge.

Robustness and teamwork in a massively multiagent environment

Latest submissions

No submissions made in this challenge.

3D Seismic Image Interpretation by Machine Learning

Latest submissions

See All
failed 99353

5 Puzzles 21 Days. Can you solve it all?

Latest submissions

No submissions made in this challenge.

Latest submissions

No submissions made in this challenge.

Play in a realistic insurance market, compete for profit!

Latest submissions

See All
graded 125874
graded 121934
failed 116909

5 Puzzles 21 Days. Can you solve it all?

Latest submissions

No submissions made in this challenge.

Multi-Agent Reinforcement Learning on Trains

Latest submissions

No submissions made in this challenge.

A dataset and open-ended challenge for music recommendation research

Latest submissions

See All
failed 303444

A benchmark for image-based food recognition

Latest submissions

See All
graded 114994
graded 114972
failed 114971

Latest submissions

No submissions made in this challenge.

Sample-efficient reinforcement learning in Minecraft

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
failed 124981
failed 124727
failed 124726

5 Puzzles, 3 Weeks. Can you solve them all? ๐Ÿ˜‰

Latest submissions

No submissions made in this challenge.

Multi-agent RL in game environment. Train your Derklings, creatures with a neural network brain, to fight for you!

Latest submissions

No submissions made in this challenge.

Predicting smell of molecular compounds

Latest submissions

No submissions made in this challenge.

Latest submissions

No submissions made in this challenge.

Latest submissions

No submissions made in this challenge.

5 Problems 21 Days. Can you solve it all?

Latest submissions

No submissions made in this challenge.

5 Puzzles 21 Days. Can you solve it all?

Latest submissions

No submissions made in this challenge.

5 Puzzles, 3 Weeks | Can you solve them all?

Latest submissions

No submissions made in this challenge.

5 PROBLEMS 3 WEEKS. CAN YOU SOLVE THEM ALL?

Latest submissions

No submissions made in this challenge.

Grouping/Sorting players into their respective teams

Latest submissions

No submissions made in this challenge.

Dog breed classification

Latest submissions

No submissions made in this challenge.

5 Problems 15 Days. Can you solve it all?

Latest submissions

See All
failed 71051
failed 71041

Sample-efficient reinforcement learning in Minecraft

Latest submissions

No submissions made in this challenge.

Multi Agent Reinforcement Learning on Trains.

Latest submissions

No submissions made in this challenge.

Recognise Handwritten Digits

Latest submissions

See All
graded 191633
submitted 191628
submitted 191622

Crowdsourced Map Land Cover Prediction

Latest submissions

See All
graded 60315
graded 60314

Latest submissions

No submissions made in this challenge.

5 Problems 15 Days. Can you solve it all?

Latest submissions

No submissions made in this challenge.

Project 2: Road extraction from satellite images

Latest submissions

No submissions made in this challenge.

Project 2: build our own text classifier system, and test its performance.

Latest submissions

No submissions made in this challenge.

Predict if users will skip or listen to the music they're streamed

Latest submissions

No submissions made in this challenge.

Identifying relevant concepts in a large corpus of medical images

Latest submissions

No submissions made in this challenge.

Latest submissions

No submissions made in this challenge.

5 PROBLEMS 3 WEEKS. CAN YOU SOLVE THEM ALL?

Latest submissions

See All
failed 77264

Real Time Mask Detection

Latest submissions

See All
graded 67702
graded 67701
graded 67600

Latest submissions

No submissions made in this challenge.

Predict if users will skip or listen to the music they're streamed

Latest submissions

No submissions made in this challenge.

Latest submissions

No submissions made in this challenge.

Solve the jigsaw and finish the picture!

Latest submissions

No submissions made in this challenge.

Predicting wine quality

Latest submissions

No submissions made in this challenge.

Predict whether an individual will be back to prison

Latest submissions

No submissions made in this challenge.

Latest submissions

No submissions made in this challenge.

Analyse Sentiment From Sound Clips

Latest submissions

No submissions made in this challenge.

Predict viewer reactions from a large-scale video dataset!

Latest submissions

See All
graded 124097

Reinforcement Learning, IIT-M, assignment 1

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
failed 156316

5 puzzles and 1 week to solve them!

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
graded 128368

Latest submissions

No submissions made in this challenge.

Multi-Agent Reinforcement Learning on Trains

Latest submissions

No submissions made in this challenge.

Latest submissions

No submissions made in this challenge.

Train your RL agents

Latest submissions

See All
graded 165868
failed 162152

Localization, SLAM, Place Recognition, Visual Navigation, Loop Closure Detection

Latest submissions

No submissions made in this challenge.

Identify Words from silent video inputs.

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
failed 195996
submitted 195995
failed 183788

A Challenge on Continual Learning using Real-World Imagery

Latest submissions

No submissions made in this challenge.

Use an RL agent to build a structure with natural language inputs

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
graded 281041
failed 281038
graded 281037

Generating answers using image-linked data

Latest submissions

See All
failed 292367
failed 292325
graded 286048

Synthesising answers from image and web sources

Latest submissions

See All
graded 283744
graded 282957
graded 282706

Contextual answering in multi-turn dialogue

Latest submissions

See All
graded 282958
graded 282199
failed 282198
Participant Rating
BhaviD 0
will_kwan 0
lars12llt 0
jansi_rani_s_v 0
branden_murray 0
saketha_ramanujam 0
vrv 0
jerome_patel 0
shivam 136
cadabullos 0
krishna_kaushik 0
unnikrishnan.r 261
Participant Rating
vrv 0
aicrowd-bot
shivam 136
unnikrishnan.r 261

Orak Game Agent Challenge

Pokemon map is broken?

Yesterday

@ilya_gusev is there a particular submission you are referring to that I can check?

We didnโ€™t change anything specific to pokemon and wouldnโ€™t expect this change. Maybe the game hasnโ€™t started yet? The game ROM we received from Krafton team starts at the menu screen and the โ€œMap on Screenโ€ wouldnโ€™t be defined for that screen.

Is there something I can cross check on the server logs or can you give the exact steps to replicate this issue so that we can pass this to Krafton team?

Starcraft submission

2 days ago

@mikhail1 @cheong_wei_xun

can you please take a look at this

Model submission

2 days ago

While running remote mode, it seems like the Star Craft failed after entering the second episode, is it possible that thereโ€™s a problem with the code in the MCP server or game envs of AIcrowd?

The issue was due to requests getting queued and dequeued arbitrarily at MCP server. We moved to a gRPC based implementation to get around this issue.

Please pull the recent changes made to starter kit and let us know if you still run into any issues.

long will the evaluation process usually take?

Itโ€™s subjective. However, if you use the random agents (not really random, they simply repeat the same static action), this is what you should expect.

Do score only appear after all games are completed?

Yes. Your submission must complete all games to be marked as graded. Your submission wonโ€™t appear on leaderboard otherwise.

will it directly show the score on the submission page once all games are completed or it will take some additional time to make the score shown on the submission page

Your submission would be marked as failed if it doesnโ€™t complete all the games. In case your evaluation times out i.e. doesnโ€™t finish all the games in 12 hours, the submission would eventually get marked as failed.

Clarification on ORAK Scoring Standards and Remote Mode Episode Settings

2 days ago

When running the games in remote mode, how many episodes are executed for each game?

Three episodes each for all games.

And for games like 2048, is the final score taken as the average of three rounds

Final score for each game would be the average scores across episodes.

Super_mario

2 days ago

Hey, we released a few patches to the starter kit that would remove fastmcp dependencies. Can you please pull the recent changes, give it a try and reach out to us if you are still running into problems?

๐Ÿ“ข Starter Kit Update

2 days ago

Hello everyone! A quick heads up about an important stability update to the starter kit.

Weโ€™ve migrated the transport/communication backend (previously based on MCP/FastMCP) to gRPC to make the interaction between your agent and the game environment more robust and predictable .

What changed?

  • The underlying transport layer is now gRPC-based .
  • The agent-facing interface and APIs are unchanged . Your existing agent code should continue to work as is.
  • The main goal of this change is better resilience under long games and reconnections .

Why this matters?

Some of you were seeing:

  • Random stalls / timeouts during longer runs
  • Reconnection issues
  • Episodes hanging with no clear error

These issues were caused by how requests were queued and retried in the previous MCP-based setup. With gRPC, we now strictly enforce โ€œone client, one action in flightโ€ , and we get clearer error handling, which should eliminate these stalls and make reconnect behavior much more reliable.

What you need to do

  1. Update your local starter kit
    • Pull the latest changes from the repo (e.g., git pull --rebase ).
  2. Reinstall/refresh dependencies if needed
    • uv sync
  3. Run your existing agents as usual
    • No changes should be required to your agent logic or environment interaction code.

If you still see issues

If you run into:

  • Timeouts
  • Stalls
  • Reconnection problems

please share:

  • Logs (client + server, if possible)
  • Approximate episode length and map
  • Steps to reproduce

This will help us quickly track down any remaining edge cases.

Thanks for your patience while we tracked this down.

StarCraft Stuck After Episode 1 (โ€˜Client is not connectedโ€™ Issue)

5 days ago

We use the Linux headless binary for evaluations, and the steps below should help you get everything set up clearly:

Model submission

5 days ago

  1. The final evaluations would be manually run by Krafton AI team and the prizes would be decided based on the outcome of the manual runs. Krafton AI team would verify the size of the model you submit during the final evaluations.
  2. No, we do not require any sort of access to your LLM server. However, for the final evaluations, you would need to include precise instructions and code that is needed to start your LLM server and ensure that Krafton AI team is able to run everything end-to-end.

Model submission

5 days ago

No, it doesnโ€™t need to be hosted externally. We simply provide the endpoints that let you get game observations, and your agent only needs to return the actions to execute in the game. How you produce those actions whether through local models, external services, or any other setup is entirely up to you.

This means your machine must be able to access whichever models or services you rely on during inference.


In Local Mode, everything (your runner, game launcher, agents, and the MCP game servers) runs directly on your machine. The runner starts the game servers, initializes your agents, and your agents communicate with the servers over localhost. There is no connection to the AIcrowd backend.


In Remote Mode, your runner and agents (including the LLM calls you make etc.,) still run locally, but the MCP game servers run on AIcrowdโ€™s remote infrastructure. Your runner creates a session via the Session API, receives the MCP server URLs, and your agents interact with the remote servers over HTTPS.


  • Local Mode hosts the entire stack on your machine.
  • Remote Mode keeps your agents local while offloading the game servers and environments to AIcrowd.

Hope this makes it clear

ModuleNotFoundError: No module named 'omegaconf'

6 days ago

uv looks for pyproject.toml and automatically manages the virtual environment at the repo level.

Can you verify that your environment is actually being used?

# check which Python uv is running
uv run python
# try importing a starter-kitโ€“specific library, for example:
import sc2

If this import works, then uv is configured correctly.

You can install additional packages with:

uv add <package>

Although uv pip install -r pyproject.toml works, itโ€™s generally better to use:

uv sync

This installs the exact dependency versions listed in uv.lock, matching the environment used during starter-kit testing.

Model submission

6 days ago

You donโ€™t need to submit your model directly for this challenge. When you make a submission, we automatically launch an instance of each game and provide your agent with a unique MCP address for that run. Each game reports its score back to us, and we update the leaderboard accordingly.

Question about SC2 map setting in starter kit

7 days ago

You can search for the map names directly and you should be able to find them.

Maps ending with * AIE and * LIE are patched versions. They allow the latest maps to work with the last StarCraft II binary released for Linux. You can find these * AIE maps here: Maps - AI Arena Wiki

The map Ancient Cistern LE can be found here: Large-Language-Models-play-StarCraftII/Maps at main ยท histmeisah/Large-Language-Models-play-StarCraftII ยท GitHub

If you are using the latest StarCraft II version (and not the Linux headless binary), you do not need to patch the maps. You should be able to run the games as-is.

Just make sure youโ€™ve upgraded both burnysc2 and s2clientprotocol to their latest versions.

StarCraft Stuck After Episode 1 (โ€˜Client is not connectedโ€™ Issue)

7 days ago

We are investigating this issue and will get back as soon as we have an update

Question about SC2 map setting in starter kit

14 days ago

Ancient Cistern LIE is a ported version of Ancient Cistern LE. Since our evaluation environment uses the Linux headless SC2 binary, and the latest available release is still v4.10, it cannot load the newer official maps. To address this, we ported Ancient Cistern LE ourselves using a community-provided patch: GitHub - aiarena/sc2patch.

Flextrack Challenge 2025

๐Ÿ’ฌ Feedback & Suggestions

4 months ago

@imeintanis Using AIcrowd | Flextrack Challenge 2025 | Submissions (https://www.aicrowd.com/challenges/flextrack-challenge-2025/submissions?my_submissions=true_ should display the list of all submissions you made to the challenge.

You can get to this page by using the submission filters option.

๐Ÿ’ฌ Feedback & Suggestions

4 months ago

Hello @jack_vandyke

Thanks for bringing this up. A minor misconfigured on our end resulted in this. All the pending evaluations are graded now. You should get your evaluation results almost instantly without any wait time.

Commonsense Persona-Grounded Dialogue Chall-0431ae

Issues regarding CPDC 2025 rerun

5 months ago

Hello @yiyang_zheng

We were able to run your submission fine. It was indeed an issue with git LFS files.

Meta CRAG - MM Challenge 2025

Request for investigation into the cause of the error

6 months ago

Hello @NineGates

We are re-evaluating 289818 (seems like some network error caused it to fail with the error โ€œfailed to load artifactsโ€) and 289826 (evaluation didnโ€™t start).

We canโ€™t re-evaluate the rest as they failed due to timeout errors. Please note that itโ€™s not the average prediction time we look at it, there is hard constraint on per step timeout. One step that violates this can fail the submission.

  • 289762 violated the overall timeout.
  • 289784 failed due to a per step timeout

2025-06-18 08:18:43.290 INFO main:run_with_timeout:154 - Running batch_generate_response with timeout 80
2025-06-18 08:20:28.554 INFO main:run_with_timeout:164 - Executed batch_generate_response in 105.26403450965881 seconds

  • 289820 failed due to a per step timeout

2025-06-18 07:06:50.363 INFO main:run_with_timeout:154 - Running batch_generate_response with timeout 80
2025-06-18 07:08:14.969 INFO main:run_with_timeout:164 - Executed batch_generate_response in 84.60583829879761 seconds

Submission Status Change From "Generating" to "Prepare Generate" and Stuck

6 months ago

In case your submission emits logs that can further help, let us know. We will clean up any dataset related log lines and share it with you.

jyotish has not provided any information yet.

Notebooks

Create Notebook