Activity
Challenge Categories
Challenges Entered
Improve RAG with Real-World Benchmarks | KDD Cup 2025
Latest submissions
See Allgraded | 282712 | ||
failed | 281820 |
Improve RAG with Real-World Benchmarks
Latest submissions
Revolutionise E-Commerce with LLM!
Latest submissions
See Allgraded | 270741 | ||
graded | 270740 | ||
graded | 270655 |
Participant | Rating |
---|---|
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
Participant | Rating |
---|
-
NVIDIA-Merlin Amazon KDD Cup '23: Multilingual Recommendation ChallengeView
-
Team_NVIDIA Amazon KDD Cup 2024: Multi-Task Online Shopping Challenge for LLMsView
Meta CRAG - MM Challenge 2025

Does "search_pipeline" source change during LB submission
4 days agoI have noticed that you just (14 hour ago) updated the HuggingFace websearch vector database from 113k entries to 647k entires. Is the new database similar to the LB database?
For us to tune our models during local validation, we need a local validation database similar to what our models will see during LB submission. Is the current (newly updated websearch database) similar to LB database? And, is the image search validation database similar to LB image search validation database?
=========
Let me clarify my question. (1) For validation we have 647k entries in web search database to help us answer 1548 validation queries. So we have 418 database entries per validation question. Is this the same ratio that our models will see during LB submission web search?
(2) Furthermore, a certain percentage of validation queries have their answer contained inside the web search vector database (with the rest of vector database being noise). During LB submission, does the same percentage of answers and noise exist in the LB vector database?
And lastly, can you answer these 2 questions for image search? Thank you!

Does "search_pipeline" source change during LB submission
11 days agoIn the old evaluation script, the agent defined the search pipeline as "crag-mm-2025/image-search-index-validation"
which means that the same vector database is used for both local validation and LB submission.
I see the new starter kit changed this. My question is: Does our submission use a different search pipeline or does submission also use "crag-mm-2025/image-search-index-validation"
?


How to use private model
13 days agoChatGPT says we can add them as collaborators under settings. What is aicrowd’s HF username?

Where is the Starter Kit for submissions?
28 days agoHi everyone, I see that many teams have already submitted to the leaderboard. Where can i find the “For up to date instructions for your challenge, please refer to the starter kit provided by challenge organisers.” and “git clone ”?
Amazon KDD Cup 2024: Multi-Task Online Shopping Ch

Note for our final evaluation
10 months ago@yilun_jin , Sometimes the exact same code will succeed one time and fail another time. For example we submitted the exact same code [here] and [here]. The first succeeded and the second failed. During re-run what happens if code fails that has previously succeeded? Will the admins run it a second time?
Also, can you tell us why the second link above failed?
When we select our final 2 submissions for each track, should we just select our best scoring submission twice in case it fails the first time it is re-run?

All Submissions Are Failing
10 months agoOur team’s last 6 submissions failed. And when I look at the list of submissions from the other teams in the past 4 hours, all other teams failed too. Is there a problem with AIcrowd server?
Here are the links of our team’s last two failures [here] and [here]
Can an admin please investigate? Thank you.

Push gitlab and cannot find issue
10 months agoThe same thing has just happened to me. I have have created 5 new tags. They all appear in my GitLab but none appear in my issues.
They are tags submission-200, submission-202, submission-203, submission-204, submission-205. Some code are duplicates of each other because I tried submitting the same thing twice without success.

All Submissions "Waiting In Queue" for 12 Hours
10 months agoFYI, all submissions (from all teams) have been “waiting in queue” for the past 12 hours. Perhaps an admin can investigate. Thanks.

Submission stuck on "evaluation initiated"
10 months agoThe following two submissions [here] and [here] are stuck with label “evaluation initiated” even though they have failed.
Can an admin switch the GitLab label to failed? Because as is, they are using 2 submission quotas. Thanks.

Submission Failed - Please Tell Us When Submission Works Again
11 months agoYes, this is not fixed. I just submitted and got
Submission failed : Failed to communicate with the grader. Please resubmit again in a few hours if this issue persists..
The GitLab issue is [here]
For the past 2 days, no team has been able to submit to track 5.
Please fix this issue and let us know when it is fixed. Thank you

Submissions fail
11 months agoI am also seeing weird submission behavior today. I posted a discussion describing the errors I have been seeing today [here]

Submission Failed - Please Tell Us When Submission Works Again
11 months agoHi, for the past 4 hours, I have been receiving " Submission failed : Failed to communicate with the grader. Please resubmit again in a few hours if this issue persists..
" when submitting to track 5. An example GitLab issue (for admins to review) is [here].
I have tried 3 times and received 3 “failed” submissions. I do not want to try anymore because I do not want to use up my failed submission quota. Can an admin tell us when submissions are working for track 5 again? Thanks.

Track 2 LB Doesn't Show Retrieval Score
11 months agoHi, Can admins @yilun_jin fix the track 2 leaderboard webpage to show each teams’ retrieval score? Thank you.

Phase 2 launching!
11 months agoI notice that AIcrowd website says “Round 2: 21 days left” which implies that phase 2 ends on June 15th. It this the correct end of phase 2?

Another Frozen Evaluation
12 months agoThank you for fixing our previous frozen evaluation.
We have another evaluation here. The GitLab issue page shows that it failed but the AIcrowd website is still showing that the submission is being evaluated.
As such, we cannot submit again to this track because the AIcrowd website thinks that a submission is in progress. Can an admin @yilun_jin update the AIcrowd website to acknowledge that our submission failed thus allowing us to make a new submission?
Thank you.

Our Evaluation is Frozen
12 months agoHi. Our submission to track 1 [here] has frozen. It shows 97% completed and it appears to be within all time limits. There has been no update for the past 3 hours.
Can an admin @yilun_jin please unfreeze our submission and post the results? Thank you

Are some errors caused by AIcrowd server and not submission code?
About 1 year agoHi. Can an admin please solve the following problem and provide a score for our last submission?
We just submitted our other code that failed yesterday and this time the AIcrowd GitLab issue says the error is caused by AIcrowd server. The server could not load the sentence-transformers/all-MiniLM-L6-v2
to evaluate the score. Below is copy and paste from reason for failure.
Evaluate Scores: OSError: We couldn’t connect to ‘https://huggingface.co’ to load this file, couldn’t find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at ‘https://huggingface.co/docs/transformers/installation#offline-mode’.
Our code successfully predicted all questions within all time limits and then AIcrowd server failed to load sentence transformer to compute our LB score

Are some errors caused by AIcrowd server and not submission code?
About 1 year agoI just submitted the exact same code to the exact same track that failed earlier today. This time it succeeded. Does this mean that we need to repeatedly submit our failed code to AIcrowd server?
What is causing the inconsistent behavior of AIcrowd server? Do fails that occur because of AIcrowd server count toward are weekly track 1-4 limit of 20 and weekly track 5 limit of 3?
Questions about the leaderboard
YesterdayThanks for improving the leaderboard. There are still 5 things wrong with the leaderboard. Can you please fix the following 5 things? Thanks.
Multi-source Augmentation (task 2) ranking is being determined by “Ego Samples” when ranking should be determined by “All Samples”. Furthermore, when we click to see mulit-source augmentation LB, we should see “All Samples” first by default.
Multi-source Augmentation (task 2) ranking is being sorted by “Accuracy” when ranking should be determined by “Truthfulness”.
Multi-turn QA (task 3) ranking is being determined by “Ego Samples” when ranking should be determined by “All Samples”. Furthermore, when we click to see mulit-turn QA LB, we should see “All Samples” first by default.
Mulit-turn QA (task3) “Ego Samples” is displaying all scores as NAN
Top score on Single Source Augmentation (task 1) incorrectly computes truthfullness as 0.889 when their hallucinations are 0.219 and accuracy is 0.108 (i.e. their truthfulness should be -0.111. Other team truthfulness scores were updated yesterday but this score was not updated).
Thanks for fixing these 5 issues!