
Location
Badges
Activity
Challenge Categories
Challenges Entered
Improve RAG with Real-World Benchmarks | KDD Cup 2025
Latest submissions
See All| graded | 288862 | ||
| graded | 288861 | ||
| graded | 288860 |
Revolutionising Interior Design with AI
Latest submissions
Evaluate Natural Conversations
Latest submissions
Understand semantic segmentation and monocular depth estimation from downward-facing drone images
Latest submissions
Audio Source Separation using AI
Latest submissions
A benchmark for image-based food recognition
Latest submissions
See All| graded | 177111 | ||
| failed | 177108 | ||
| graded | 176842 |
What data should you label to get the most value for your money?
Latest submissions
See All| graded | 179189 | ||
| graded | 179151 | ||
| graded | 179149 |
Perform semantic segmentation on aerial images from monocular downward-facing drone
Latest submissions
Generating answers using image-linked data
Latest submissions
See All| graded | 288862 | ||
| graded | 288859 | ||
| graded | 288858 |
| Participant | Rating |
|---|
| Participant | Rating |
|---|
Meta CRAG - MM Challenge 2025
Clarification about the evaluation process
7 months agoI see, thank you for the clarification.
I appreciate you reaching out to the organizers. I will proceed on the assumption that we may not receive a response.
Thanks!
Clarification about the evaluation process
7 months agoTo be honest, I donโt really understand why you replied, made changes, then deleted your response and are now staying silent about this post.
If a certain question canโt be answered, thatโs totally fine. Please just let me know.
Clarification about the evaluation process
7 months ago@yilun_jin8
Any updates? If there are some questions you canโt answer, please let me know so.
Thanks.
Clarification about the evaluation process
7 months ago@yilun_jin8 @mohanty
can you check these questions?
Evaluation Method of Leaderboard
7 months agoThanks for the quick action!
In my humble opinion, modifying the evaluation prompt is not a solution.
You just need to declare that a prompt injection solution will be eliminated before selecting the top 10 teams.
Evaluation Method of Leaderboard
7 months ago@yilun_jin8
Another question. What if all the top 10 teams in phase 2 will use submissions with prompt injection, like on the current leaderboard? I think the current auto-evaluation method is too weak against prompt injection and can be easily exploited.
I believe manual evaluators would consider such answer invalid(wrong), but since only 10 teams are selected for manual evaluation, thereโs a risk that none of the top submissions are meaningful.
Clarification about the evaluation process
7 months ago- Regarding participation eligibility, is my understanding correct?
- Phase 1: All teams can participate
- Phase 2: Only teams that successfully submit in Phase 1 can participate
- Final Round: Only the top 10 teams in phase 2 based on automatic evaluation can participate
-
How is the final submission selected? Can we change from the best leaderboard submission?
-
Is there no length limit for the final evaluation? (The limit is 75 tokens for automatic evaluation)
Full responses are manually checked for hallucinations.
-
How is the generation of the first token detected?
A 10-second timeout starts after the first token is generated.
-
How is time per sample measured in the batch generation pipeline?
Only answer texts generated within 30 seconds are considered.
-
If we exceed the time limit, will we be immediately disqualified? Or just the sample will be considered as wrong (or missing)?
-
Is a missing answer required to be an exact match to โI donโt know,โ or are similar responses acceptable in manual evaluation? Which of the following statements is correct?
Missing (e.g., โI donโt knowโ, โIโm sorry I canโt find โฆโ) โ Score: 0.0
All missing answers should return a standard response: โI donโt know.โ
Evaluation Method of Leaderboard
7 months agoThanks, have you completed the process? It seems that some submissions have not been re-evaluated yet.
Evaluation Method of Leaderboard
7 months agoYes, it seems that a bug has recently appeared.
It looks like no one has been able to get a โcorrectโ other than exact match.
Even when submitting the exact same commit as a submission that achieved โcorrectโ on April 29, I canโt get the same score.
@yilun_jin8
Did you make any changes to the evaluation metric since then?
Evaluation Method of Leaderboard
7 months agoThanks!
May I ask why some submissions seems correctly submitted but not graded?
For example:
โAIcrowd | Single-source Augmentation | Submissions #283034โ
โAIcrowd | Single-source Augmentation | Submissions #283028โ (not mine)
Evaluation Method of Leaderboard
7 months agoIs the evaluation method for the leaderboard scores the same as in local_evaluation.py?
Are the prompts and LLM used for evaluation also the same?
Iโm wondering because the scores I get locally donโt match the ones on the leaderboard.
Generative Interior Design Challenge 2024
Can't open baseline starter kit
Almost 2 years agoI got 404 when I tried to open baseline starter kit.
@snehananavati Could you please check it out?
Data Purchasing Challenge 2022
[Announcement] Leaderboard Winners
Over 3 years agoThanks! It should be same as other platform like Kaggle, you can just create a discussion thread to share your approach! Of couse it would be the most helpful if you kindly share the code as well, but this competition was very structured so just sharing approach may be eough to understand what leads you to win:)
[Announcement] Leaderboard Winners
Over 3 years agoBig congrats for the winners, especially for @xiaozhou_wang, it seems you won the competition by a large margin! Really curious about your solution, it would be great if you can share with community:)
:rotating_light: Select submissions for final evaluation
Over 3 years agoHi @shivam @dipam, do you have any timeline for the leaderboard update?
:rotating_light: Select submissions for final evaluation
Over 3 years agoHi ๏ผ shivam, is there any progress?
:rotating_light: Select submissions for final evaluation
Over 3 years agoHi @dipam, thanks for hosting the interesting compeitition!
It seems the competition was finished, when will the leaderborad be finalized?
Clarification about the evaluation process
7 months agoThanks for the clarification!