Loading
12 Follower
0 Following
Chris_Deotte
Chris Deotte

Organization

Nvidia

Location

US

Badges

2
2
1

Connect

Activity

Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
Mon
Wed
Fri

Challenge Categories

Loading...

Challenges Entered

Improve RAG with Real-World Benchmarks | KDD Cup 2025

Latest submissions

See All
graded 282712
failed 281820

Improve RAG with Real-World Benchmarks

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
graded 270741
graded 270740
graded 270655

Latest submissions

See All
graded 235811
graded 235350
graded 235349

Latest submissions

See All
graded 235349
graded 235348
graded 235347

Latest submissions

See All
graded 235350
graded 235166
graded 235125

Generating answers using image-linked data

Latest submissions

See All
graded 282712
failed 281820
Participant Rating
mincheolyoon 0
happystat 0
unna97 0
linchia 0
Karrich 0
gaozhanfire 0
pengbo_wang 0
pp2915 0
pengyue_jia3 0
GenpengXu 0
xiaopeng_li 0
eliot8 0
Participant Rating
  • NVIDIA-Merlin Amazon KDD Cup '23: Multilingual Recommendation Challenge
    View
  • Team_NVIDIA Amazon KDD Cup 2024: Multi-Task Online Shopping Challenge for LLMs
    View

Meta CRAG - MM Challenge 2025

Questions about the leaderboard

Yesterday

Thanks for improving the leaderboard. There are still 5 things wrong with the leaderboard. Can you please fix the following 5 things? Thanks.

  1. Multi-source Augmentation (task 2) ranking is being determined by “Ego Samples” when ranking should be determined by “All Samples”. Furthermore, when we click to see mulit-source augmentation LB, we should see “All Samples” first by default.

  2. Multi-source Augmentation (task 2) ranking is being sorted by “Accuracy” when ranking should be determined by “Truthfulness”.

  3. Multi-turn QA (task 3) ranking is being determined by “Ego Samples” when ranking should be determined by “All Samples”. Furthermore, when we click to see mulit-turn QA LB, we should see “All Samples” first by default.

  4. Mulit-turn QA (task3) “Ego Samples” is displaying all scores as NAN

  5. Top score on Single Source Augmentation (task 1) incorrectly computes truthfullness as 0.889 when their hallucinations are 0.219 and accuracy is 0.108 (i.e. their truthfulness should be -0.111. Other team truthfulness scores were updated yesterday but this score was not updated).

Thanks for fixing these 5 issues!

Does "search_pipeline" source change during LB submission

4 days ago

I have noticed that you just (14 hour ago) updated the HuggingFace websearch vector database from 113k entries to 647k entires. Is the new database similar to the LB database?

For us to tune our models during local validation, we need a local validation database similar to what our models will see during LB submission. Is the current (newly updated websearch database) similar to LB database? And, is the image search validation database similar to LB image search validation database?

=========
Let me clarify my question. (1) For validation we have 647k entries in web search database to help us answer 1548 validation queries. So we have 418 database entries per validation question. Is this the same ratio that our models will see during LB submission web search?

(2) Furthermore, a certain percentage of validation queries have their answer contained inside the web search vector database (with the rest of vector database being noise). During LB submission, does the same percentage of answers and noise exist in the LB vector database?

And lastly, can you answer these 2 questions for image search? Thank you!

Does "search_pipeline" source change during LB submission

11 days ago

In the old evaluation script, the agent defined the search pipeline as "crag-mm-2025/image-search-index-validation" which means that the same vector database is used for both local validation and LB submission.

I see the new starter kit changed this. My question is: Does our submission use a different search pipeline or does submission also use "crag-mm-2025/image-search-index-validation"?

How to use private model

13 days ago

How to use private model

13 days ago

ChatGPT says we can add them as collaborators under settings. What is aicrowd’s HF username?

Where is the Starter Kit for submissions?

28 days ago

Hi everyone, I see that many teams have already submitted to the leaderboard. Where can i find the “For up to date instructions for your challenge, please refer to the starter kit provided by challenge organisers.” and “git clone ”?

Amazon KDD Cup 2024: Multi-Task Online Shopping Ch

Note for our final evaluation

10 months ago

@yilun_jin , Sometimes the exact same code will succeed one time and fail another time. For example we submitted the exact same code [here] and [here]. The first succeeded and the second failed. During re-run what happens if code fails that has previously succeeded? Will the admins run it a second time?

Also, can you tell us why the second link above failed?

When we select our final 2 submissions for each track, should we just select our best scoring submission twice in case it fails the first time it is re-run?

All Submissions Are Failing

10 months ago

Our team’s last 6 submissions failed. And when I look at the list of submissions from the other teams in the past 4 hours, all other teams failed too. Is there a problem with AIcrowd server?

Here are the links of our team’s last two failures [here] and [here]

Can an admin please investigate? Thank you.

Push gitlab and cannot find issue

10 months ago

The same thing has just happened to me. I have have created 5 new tags. They all appear in my GitLab but none appear in my issues.

They are tags submission-200, submission-202, submission-203, submission-204, submission-205. Some code are duplicates of each other because I tried submitting the same thing twice without success.

All Submissions "Waiting In Queue" for 12 Hours

10 months ago

FYI, all submissions (from all teams) have been “waiting in queue” for the past 12 hours. Perhaps an admin can investigate. Thanks.

Submission stuck on "evaluation initiated"

10 months ago

The following two submissions [here] and [here] are stuck with label “evaluation initiated” even though they have failed.

Can an admin switch the GitLab label to failed? Because as is, they are using 2 submission quotas. Thanks.

Submission Failed - Please Tell Us When Submission Works Again

11 months ago

Yes, this is not fixed. I just submitted and got
Submission failed : Failed to communicate with the grader. Please resubmit again in a few hours if this issue persists..
The GitLab issue is [here]

For the past 2 days, no team has been able to submit to track 5.

Please fix this issue and let us know when it is fixed. Thank you

Submissions fail

11 months ago

I am also seeing weird submission behavior today. I posted a discussion describing the errors I have been seeing today [here]

Submission Failed - Please Tell Us When Submission Works Again

11 months ago

Hi, for the past 4 hours, I have been receiving " Submission failed : Failed to communicate with the grader. Please resubmit again in a few hours if this issue persists.." when submitting to track 5. An example GitLab issue (for admins to review) is [here].

I have tried 3 times and received 3 “failed” submissions. I do not want to try anymore because I do not want to use up my failed submission quota. Can an admin tell us when submissions are working for track 5 again? Thanks.

Track 2 LB Doesn't Show Retrieval Score

11 months ago

Hi, Can admins @yilun_jin fix the track 2 leaderboard webpage to show each teams’ retrieval score? Thank you.

Phase 2 launching!

11 months ago

I notice that AIcrowd website says “Round 2: 21 days left” which implies that phase 2 ends on June 15th. It this the correct end of phase 2?

Another Frozen Evaluation

12 months ago

Thank you for fixing our previous frozen evaluation.

We have another evaluation here. The GitLab issue page shows that it failed but the AIcrowd website is still showing that the submission is being evaluated.

As such, we cannot submit again to this track because the AIcrowd website thinks that a submission is in progress. Can an admin @yilun_jin update the AIcrowd website to acknowledge that our submission failed thus allowing us to make a new submission?

Thank you.

Our Evaluation is Frozen

12 months ago

Hi. Our submission to track 1 [here] has frozen. It shows 97% completed and it appears to be within all time limits. There has been no update for the past 3 hours.

Can an admin @yilun_jin please unfreeze our submission and post the results? Thank you

Are some errors caused by AIcrowd server and not submission code?

About 1 year ago

Hi. Can an admin please solve the following problem and provide a score for our last submission?

We just submitted our other code that failed yesterday and this time the AIcrowd GitLab issue says the error is caused by AIcrowd server. The server could not load the sentence-transformers/all-MiniLM-L6-v2 to evaluate the score. Below is copy and paste from reason for failure.

Evaluate Scores: OSError: We couldn’t connect to ‘https://huggingface.co’ to load this file, couldn’t find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at ‘https://huggingface.co/docs/transformers/installation#offline-mode’.

Our code successfully predicted all questions within all time limits and then AIcrowd server failed to load sentence transformer to compute our LB score

Are some errors caused by AIcrowd server and not submission code?

About 1 year ago

I just submitted the exact same code to the exact same track that failed earlier today. This time it succeeded. Does this mean that we need to repeatedly submit our failed code to AIcrowd server?

What is causing the inconsistent behavior of AIcrowd server? Do fails that occur because of AIcrowd server count toward are weekly track 1-4 limit of 20 and weekly track 5 limit of 3?

Earned a BA in mathematics then worked as a graphic artist, photographer, carpenter, and teacher. Earned a PhD in computational science and mathematics with a thesis on optimizing parallel processing. Now work as a data scientist and researcher.