
Organization
Location
Badges
Activity
Challenge Categories
Challenges Entered
Improve RAG with Real-World Benchmarks | KDD Cup 2025
Latest submissions
See Allgraded | 289797 | ||
graded | 289788 | ||
graded | 289778 |
Improve RAG with Real-World Benchmarks
Latest submissions
See Allgraded | 267130 | ||
graded | 267129 | ||
graded | 267099 |
Amazon KDD Cup 2022
Latest submissions
Testing RAG Systems with Limited Web Pages
Latest submissions
See Allgraded | 266952 | ||
graded | 266951 | ||
graded | 266273 |
Enhance RAG systems With Multiple Web Sources & Mock API
Latest submissions
See Allgraded | 267130 | ||
graded | 267129 | ||
failed | 266263 |
Generating answers using image-linked data
Latest submissions
See Allgraded | 289797 | ||
graded | 289693 | ||
graded | 289626 |
Participant | Rating |
---|---|
![]() |
0 |
![]() |
0 |
![]() |
0 |
![]() |
0 |
Participant | Rating |
---|
Meta CRAG - MM Challenge 2025

Why did 289384, 289471 faild?
6 days agoIt has already reached 100%. And 289,697 shows “Step has exceeded its deadline”

Why did 289428 failed?
7 days agoConnectionError: (MaxRetryError(‘HTTPSConnectionPool(host=‘huggingface.co’, port=443): Max retries exceeded with url: /api/datasets/crag-mm-2025/crag-mm-single-turn-debug-private/revision/b5ff0aaa05fab0256d77682b4b7da582c0660a6b (Caused by NameResolutionError(“<urllib3.connection.HTTPSConnection object at 0x7f7f00af3e50>: Failed to resolve ‘huggingface.co’ ([Errno -3] Temporary failure in name resolution)”))’), ‘(Request ID: 7c73f288-c699-438b-9794-be08cad15999)’) Check the submission page for more details.




Suggestion: Make Evaluation Prompts More Flexible
26 days agoMoreover, I believe the evaluation prompt should be made public. If someone wants to ‘hack’ the prompt, they don’t actually need to know its exact content—keeping it secret only widens the gap between local testing and server-side evaluation results.

Suggestion: Make Evaluation Prompts More Flexible
26 days agoI think the current evaluation prompt is too strict, causing everyone to respond with ‘I don’t know’ frequently just to ensure a score > 0. In reality, many answers could be considered partially correct—at least, human evaluators would take this into account. However, under the current setup, the top 10 models don’t attempt to provide partially correct strategies, which might actually perform worse in human evaluation compared to strategies scoring below 0. Yet, these strategies never even reach human review. I suggest the organizers relax the evaluation prompt to at least allow for some score differentiation.

Why failed Submission #285113
About 1 month agoEvaluation failed with exit code 1. I hope I can take a look at the error message

📢 Dataset Release: CRAG-MM v0.1.1 🚀
2 months agoIn current CragImageKG file (…/cragmm_search/image_search_mock_api/image_kg.py, cragmm-search-pipeline==0.2.10), the field in get_image_url
function should be img_url
, otherwise it will cause an error.

📢 Dataset Release: CRAG-MM v0.1.1 🚀
2 months agoThe current rag-agent does not differentiate between task1 and task2. How should UnifiedSearchPipeline
be used specifically for task1?
Meta Comprehensive RAG Benchmark: KDD Cup 2-9d1937

How exactly is the number of submissions counted ten times a week?
About 1 year agoAfter my testing, if the error is reported in the build environment, the submission time will not be deducted, but it will be recorded.

‼️ ⏰ Select Submission ID before 20th June, 2024 23:55 UTC
About 1 year ago(post deleted by author)

🚨 IMP: Phase 2 Announcement
About 1 year agosame problem, I also email the help@aicrowd.com but no responses



Has phase-2 started?
About 1 year agoI don’t know, I saw some successful commits and tried to commit and found that it got scores but didn’t update the leaderboard

Has phase-2 started?
About 1 year agohi bro, I have the same question and have not received any message.

Submission failed
About 1 year agoSubmission failed : You have exceeded the allowed number of parallel submissions. Please wait until your other submission(s) are graded.
No other submissions but failed.
Meta KDD Cup 24 - CRAG - Retrieval Summarization

About Test Set Leakage in Round 1
About 1 year agoIn fact, the test set for round1 is the data set given to us, so there is no leakage problem
Why did 289384, 289471 faild?
4 days agoWill you consider resubmitting 289697 ? @yilun_jin8 @jyotish