Activity
Challenge Categories
Challenges Entered
Improve RAG with Real-World Benchmarks | KDD Cup 2025
Latest submissions
See Allgraded | 284893 | ||
graded | 284834 | ||
graded | 282712 |
Improve RAG with Real-World Benchmarks
Latest submissions
Revolutionise E-Commerce with LLM!
Latest submissions
See Allgraded | 270741 | ||
graded | 270740 | ||
graded | 270655 |
Participant | Rating |
---|---|
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
Participant | Rating |
---|
-
NVIDIA-Merlin Amazon KDD Cup '23: Multilingual Recommendation ChallengeView
-
Team_NVIDIA Amazon KDD Cup 2024: Multi-Task Online Shopping Challenge for LLMsView
Meta CRAG - MM Challenge 2025

When is Deadline to Team up?
9 days agoI have two questions about teaming up:
- When is the deadline to team up? (In the “timeline” section of website it says May 28th, and in the “participation and submission” section it says May 21st).
- Can participants team up if their combined phase 2 submissions exceed 6? (The submission limit for phase 2 is “each team can make 6 total submissions across all three tracks”. If participant A has 6 phase 2 subs and participant B has 6 phase 2 subs, can they still team up? because after teaming up their team will have 12 phase 2 subs).

During Submission How Do We Download Web Search URL?
14 days agoOne suggestion is that participants’ code cannot communicate directly with the internet (using WGET, etc). Instead they call an API provided by AIcrowd which fetches webpages. This ensures that participants’ code only receives information from the internet and cannot submit information (i.e. the hidden test questions) to external websites.
If this is how it currently works, how do we call the API to fetch webpages?

During Submission How Do We Download Web Search URL?
15 days agoThe web search api docs here, say that we need to download the result URL ourselves.
Note: The Search APIs only return urls for images and webpages, instead of full contents. To get the full webpage contents and images, you will have to download it yourself. During the challenge, participants can assume that the connection to these urls are available.
During submission, do we need to do this? And, how do we do this? because internet is turned off. Can we use WGET on these websites?
Doesn’t this pose a risk that if a participant owns these websites, they can transfer all the test questions to their URL (using an http GET or POST) during submission and receive all the hidden test questions (then hardcode the answers into future submissions)?

Questions about the leaderboard
19 days agoThanks for improving the leaderboard. There are still 5 things wrong with the leaderboard. Can you please fix the following 5 things? Thanks.
-
Multi-source Augmentation (task 2) ranking is being determined by “Ego Samples” when ranking should be determined by “All Samples”. Furthermore, when we click to see mulit-source augmentation LB, we should see “All Samples” first by default.
-
Multi-source Augmentation (task 2) ranking is being sorted by “Accuracy” when ranking should be determined by “Truthfulness”.
-
Multi-turn QA (task 3) ranking is being determined by “Ego Samples” when ranking should be determined by “All Samples”. Furthermore, when we click to see mulit-turn QA LB, we should see “All Samples” first by default.
-
Mulit-turn QA (task3) “Ego Samples” is displaying all scores as NAN
-
Top score on Single Source Augmentation (task 1) incorrectly computes truthfullness as 0.889 when their hallucinations are 0.219 and accuracy is 0.108 (i.e. their truthfulness should be -0.111. Other team truthfulness scores were updated yesterday but this score was not updated).
Thanks for fixing these 5 issues!

Does "search_pipeline" source change during LB submission
22 days agoI have noticed that you just (14 hour ago) updated the HuggingFace websearch vector database from 113k entries to 647k entires. Is the new database similar to the LB database?
For us to tune our models during local validation, we need a local validation database similar to what our models will see during LB submission. Is the current (newly updated websearch database) similar to LB database? And, is the image search validation database similar to LB image search validation database?
=========
Let me clarify my question. (1) For validation we have 647k entries in web search database to help us answer 1548 validation queries. So we have 418 database entries per validation question. Is this the same ratio that our models will see during LB submission web search?
(2) Furthermore, a certain percentage of validation queries have their answer contained inside the web search vector database (with the rest of vector database being noise). During LB submission, does the same percentage of answers and noise exist in the LB vector database?
And lastly, can you answer these 2 questions for image search? Thank you!

Does "search_pipeline" source change during LB submission
29 days agoIn the old evaluation script, the agent defined the search pipeline as "crag-mm-2025/image-search-index-validation"
which means that the same vector database is used for both local validation and LB submission.
I see the new starter kit changed this. My question is: Does our submission use a different search pipeline or does submission also use "crag-mm-2025/image-search-index-validation"
?


How to use private model
About 1 month agoChatGPT says we can add them as collaborators under settings. What is aicrowd’s HF username?

Where is the Starter Kit for submissions?
About 2 months agoHi everyone, I see that many teams have already submitted to the leaderboard. Where can i find the “For up to date instructions for your challenge, please refer to the starter kit provided by challenge organisers.” and “git clone ”?
Amazon KDD Cup 2024: Multi-Task Online Shopping Ch

Note for our final evaluation
10 months ago@yilun_jin , Sometimes the exact same code will succeed one time and fail another time. For example we submitted the exact same code [here] and [here]. The first succeeded and the second failed. During re-run what happens if code fails that has previously succeeded? Will the admins run it a second time?
Also, can you tell us why the second link above failed?
When we select our final 2 submissions for each track, should we just select our best scoring submission twice in case it fails the first time it is re-run?

All Submissions Are Failing
11 months agoOur team’s last 6 submissions failed. And when I look at the list of submissions from the other teams in the past 4 hours, all other teams failed too. Is there a problem with AIcrowd server?
Here are the links of our team’s last two failures [here] and [here]
Can an admin please investigate? Thank you.

Push gitlab and cannot find issue
11 months agoThe same thing has just happened to me. I have have created 5 new tags. They all appear in my GitLab but none appear in my issues.
They are tags submission-200, submission-202, submission-203, submission-204, submission-205. Some code are duplicates of each other because I tried submitting the same thing twice without success.

All Submissions "Waiting In Queue" for 12 Hours
11 months agoFYI, all submissions (from all teams) have been “waiting in queue” for the past 12 hours. Perhaps an admin can investigate. Thanks.

Submission stuck on "evaluation initiated"
11 months agoThe following two submissions [here] and [here] are stuck with label “evaluation initiated” even though they have failed.
Can an admin switch the GitLab label to failed? Because as is, they are using 2 submission quotas. Thanks.

Submission Failed - Please Tell Us When Submission Works Again
11 months agoYes, this is not fixed. I just submitted and got
Submission failed : Failed to communicate with the grader. Please resubmit again in a few hours if this issue persists..
The GitLab issue is [here]
For the past 2 days, no team has been able to submit to track 5.
Please fix this issue and let us know when it is fixed. Thank you

Submissions fail
11 months agoI am also seeing weird submission behavior today. I posted a discussion describing the errors I have been seeing today [here]

Submission Failed - Please Tell Us When Submission Works Again
11 months agoHi, for the past 4 hours, I have been receiving " Submission failed : Failed to communicate with the grader. Please resubmit again in a few hours if this issue persists..
" when submitting to track 5. An example GitLab issue (for admins to review) is [here].
I have tried 3 times and received 3 “failed” submissions. I do not want to try anymore because I do not want to use up my failed submission quota. Can an admin tell us when submissions are working for track 5 again? Thanks.

Track 2 LB Doesn't Show Retrieval Score
11 months agoHi, Can admins @yilun_jin fix the track 2 leaderboard webpage to show each teams’ retrieval score? Thank you.

Phase 2 launching!
12 months agoI notice that AIcrowd website says “Round 2: 21 days left” which implies that phase 2 ends on June 15th. It this the correct end of phase 2?
When is Deadline to Team up?
4 days ago@yilun_jin8 @jyotish Hi, do either of you know the answers to these 2 quesions?