
Location
Badges
Activity
Challenge Categories
Challenges Entered
Improve RAG with Real-World Benchmarks | KDD Cup 2025
Latest submissions
See Allgraded | 288862 | ||
graded | 288861 | ||
graded | 288860 |
Revolutionising Interior Design with AI
Latest submissions
Evaluate Natural Conversations
Latest submissions
Understand semantic segmentation and monocular depth estimation from downward-facing drone images
Latest submissions
Audio Source Separation using AI
Latest submissions
A benchmark for image-based food recognition
Latest submissions
See Allgraded | 177111 | ||
failed | 177108 | ||
graded | 176842 |
What data should you label to get the most value for your money?
Latest submissions
See Allgraded | 179189 | ||
graded | 179151 | ||
graded | 179149 |
Perform semantic segmentation on aerial images from monocular downward-facing drone
Latest submissions
Generating answers using image-linked data
Latest submissions
See Allgraded | 288862 | ||
graded | 288859 | ||
graded | 288858 |
Participant | Rating |
---|
Participant | Rating |
---|
Meta CRAG - MM Challenge 2025

Clarification about the evaluation process
About 1 month agoI see, thank you for the clarification.
I appreciate you reaching out to the organizers. I will proceed on the assumption that we may not receive a response.
Thanks!

Clarification about the evaluation process
About 1 month agoTo be honest, I don’t really understand why you replied, made changes, then deleted your response and are now staying silent about this post.
If a certain question can’t be answered, that’s totally fine. Please just let me know.

Clarification about the evaluation process
About 1 month ago@yilun_jin8
Any updates? If there are some questions you can’t answer, please let me know so.
Thanks.

Clarification about the evaluation process
About 2 months ago@yilun_jin8 @mohanty
can you check these questions?

Evaluation Method of Leaderboard
About 2 months agoThanks for the quick action!
In my humble opinion, modifying the evaluation prompt is not a solution.
You just need to declare that a prompt injection solution will be eliminated before selecting the top 10 teams.

Evaluation Method of Leaderboard
About 2 months ago@yilun_jin8
Another question. What if all the top 10 teams in phase 2 will use submissions with prompt injection, like on the current leaderboard? I think the current auto-evaluation method is too weak against prompt injection and can be easily exploited.
I believe manual evaluators would consider such answer invalid(wrong), but since only 10 teams are selected for manual evaluation, there’s a risk that none of the top submissions are meaningful.

Clarification about the evaluation process
About 2 months ago- Regarding participation eligibility, is my understanding correct?
- Phase 1: All teams can participate
- Phase 2: Only teams that successfully submit in Phase 1 can participate
- Final Round: Only the top 10 teams in phase 2 based on automatic evaluation can participate
-
How is the final submission selected? Can we change from the best leaderboard submission?
-
Is there no length limit for the final evaluation? (The limit is 75 tokens for automatic evaluation)
Full responses are manually checked for hallucinations.
-
How is the generation of the first token detected?
A 10-second timeout starts after the first token is generated.
-
How is time per sample measured in the batch generation pipeline?
Only answer texts generated within 30 seconds are considered.
-
If we exceed the time limit, will we be immediately disqualified? Or just the sample will be considered as wrong (or missing)?
-
Is a missing answer required to be an exact match to “I don’t know,” or are similar responses acceptable in manual evaluation? Which of the following statements is correct?
Missing (e.g., “I don’t know”, “I’m sorry I can’t find …”) → Score: 0.0
All missing answers should return a standard response: “I don’t know.”


Evaluation Method of Leaderboard
About 2 months agoThanks, have you completed the process? It seems that some submissions have not been re-evaluated yet.

Evaluation Method of Leaderboard
About 2 months agoYes, it seems that a bug has recently appeared.
It looks like no one has been able to get a “correct” other than exact match.
Even when submitting the exact same commit as a submission that achieved “correct” on April 29, I can’t get the same score.
@yilun_jin8
Did you make any changes to the evaluation metric since then?

Evaluation Method of Leaderboard
About 2 months agoThanks!
May I ask why some submissions seems correctly submitted but not graded?
For example:
“AIcrowd | Single-source Augmentation | Submissions #283034”
“AIcrowd | Single-source Augmentation | Submissions #283028” (not mine)

Evaluation Method of Leaderboard
About 2 months agoIs the evaluation method for the leaderboard scores the same as in local_evaluation.py
?
Are the prompts and LLM used for evaluation also the same?
I’m wondering because the scores I get locally don’t match the ones on the leaderboard.
Generative Interior Design Challenge 2024


Can't open baseline starter kit
Over 1 year agoI got 404 when I tried to open baseline starter kit.
@snehananavati Could you please check it out?
Data Purchasing Challenge 2022

[Announcement] Leaderboard Winners
About 3 years agoThanks! It should be same as other platform like Kaggle, you can just create a discussion thread to share your approach! Of couse it would be the most helpful if you kindly share the code as well, but this competition was very structured so just sharing approach may be eough to understand what leads you to win:)

[Announcement] Leaderboard Winners
About 3 years agoBig congrats for the winners, especially for @xiaozhou_wang, it seems you won the competition by a large margin! Really curious about your solution, it would be great if you can share with community:)

:rotating_light: Select submissions for final evaluation
About 3 years agoHi @shivam @dipam, do you have any timeline for the leaderboard update?

:rotating_light: Select submissions for final evaluation
About 3 years agoHi @shivam, is there any progress?

:rotating_light: Select submissions for final evaluation
About 3 years agoHi @dipam, thanks for hosting the interesting compeitition!
It seems the competition was finished, when will the leaderborad be finalized?
Clarification about the evaluation process
About 1 month agoThanks for the clarification!