
Badges
Activity
Challenge Categories
Challenges Entered
Create Context-Aware, Dynamic, and Immersive In-Game Dialogue
Latest submissions
See All| graded | 291885 | ||
| graded | 291602 | ||
| graded | 291601 |
Improve RAG with Real-World Benchmarks | KDD Cup 2025
Latest submissions
See All| graded | 286148 | ||
| failed | 286122 | ||
| failed | 286113 |
Improve RAG with Real-World Benchmarks
Latest submissions
Generating answers using image-linked data
Latest submissions
See All| failed | 286113 | ||
| graded | 286062 | ||
| failed | 285972 |
| Participant | Rating |
|---|
| Participant | Rating |
|---|
Meta CRAG - MM Challenge 2025
๐ข Submit Your Technical Report and Poster by July 25 **Submission Link Updated**
5 months agoI am not very sure if virtual presentation will be an option, but last year, we invited those who cannot come on site to submit a video to present.
Any updates on the results?
6 months agoHi @aerdem4
According to what I know, the organizers from meta have compiled a list of potential winners, and are pending the final confirmation from their leaders. It should not take too long (e.g. one day or so).
According to my experience from last year, most likely we will do both โ an update on the final leaderboard, and a post in the discussion forum. However, since my experience last year does not involve human evaluation, I am not sure whether the results of human evaluation will be published.
๐จ Submission Selection Deadline: 23rd June 2025, 12:00 UTC (noon)
6 months agoI think it now shows โgradedโ?
Why Submission #289819 is finished but the score not update on LeaderBoard?
6 months agoI think it now shows โgradedโ.
Why did 289384, 289471 faild?
6 months agoFrom the logs, it seems that both 289384 and 289471 failed due to timeouts, and I have no idea about 289697 (from the logs).
Regarding the re-execution of 289837 and 289855
6 months agoHi,
From the logs, it seems that both failed because you returned None on some questions, leading to a failure. However, it can be caused by some more subtle errors (which I am not sure of).
2025-06-18 14:27:26.258
File "/home/aicrowd/starter_kit/local_evaluation.py", line 575, in truncate_agent_responses
2025-06-18 14:27:26.258
encodings = self.tokenizer.encode_batch(agent_responses)
2025-06-18 14:27:26.258
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-06-18 14:27:26.258
TypeError: argument 'input': 'NoneType' object cannot be converted to 'Sequence'
Regarding the second question, we have added an additional 10 failed submissions per week. For example, if a team have submitted 10 failed ones and 5 successful ones, they can still submit 5.
Submission Status Change From "Generating" to "Prepare Generate" and Stuck
6 months agoI think 289776 eventually failed due to timeout.
289706 somehow answered a None and caused the evaluator to fail
2025-06-18 12:41:05.661 __main__.AIcrowdError: Error from evaluator: argument 'input': 'NoneType' object cannot be converted to 'Sequence'
289611 somehow never started inference. We will re-run that and will still be counted valid for R2.
289670 and 289549 failed similarly as 289706.
Why submission #289097 #289035 #289096 failed?
6 months ago289096 succeeded and was correctly graded.
289035 failed during score calculation. We will re-grade it.
289097 failed due to some network errors (and no models was successfully downloaded).
Why submission #289148 failed
6 months agoI think it failed because some network error, as all huggingface model/data failed to download.
We will re-queue this submission, and it will be considered valid for R2 (if it passed).
Why Submission #289091񆥀 Failed?
6 months agoBoth failed due to timeout. Sorry for the late reply.
Important Update on Missing/Refusal Rate
6 months agoHi everyone in this thread,
The participants will not provide a solid limit of missing rate, because doing so would lead to aggressive overfitting of the limit.
Please consider building a โusefulโ real-world question-answering model with reasonable answer rate instead of refusing anything โ this is the main message from the organizers.
Why did Submission #288785 fail?
6 months ago2025-06-16 21:39:42.812
[rank0]: File "/aicrowd-source/agents/batch_yanshi.py", line 180, in batch_generate_response
2025-06-16 21:39:42.812
[rank0]: ress, search_results, is_search = self.frist_time_get_answer(queries, images,message_histories)
2025-06-16 21:39:42.812
[rank0]: File "/aicrowd-source/agents/batch_yanshi.py", line 154, in frist_time_get_answer
2025-06-16 21:39:42.812
[rank0]: content = doc_item['page_snippet'][:1000]
2025-06-16 21:39:42.812
[rank0]: KeyError: 'page_snippet'
This is the error for 288944.
Why Submission #288794 is failed?
6 months ago2025-06-16 00:58:06.918
ValueError: The modelโs max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (8016). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.
Commonsense Persona-Grounded Dialogue Chall-0431ae
Regarding the final ranking method
6 months ago- I donโt think so. Submissions will only be judged according to the ratings (and human evals). In addition, ties would be very rare, so I donโt think this can be possibly be used to break ties.
- For previous challenges, upon the final evaluation, we will send out a form instructing participants to select submissions for final evaluation (e.g. 2 submissions). Most likely, we will do the same for this one.
[main page leaderboard ranks]
6 months agoHi,
For the first two questions, I cannot answer them at the moment.
For the third question, I donโt think so. We will not award according to the combined results.
Access to the OPEN_AI or GPU resource
6 months agoHi,
If you submit to the API track, you can assume that the openai api key is already put into OPENAI_API_KEY, and you can directly initialize an openai client.
Similarly, if you submit to the GPU track, you can directly use xxx.cuda() to use GPU.
[number of submissions per day and reset time]
6 months agoHi,
The number of submissions is counted per team, which applies to all members aggregated.
The limit refreshes in a rolling basis. A quota will be refreshed one day after each of your submission.
Task-Oriented Dialogue (Task 1)
[task1 failed]
6 months agoI think it is a transient network error during evaluation. We will trigger a resubmission of it to see whether it solves the problem.
๐ข Submit Your Technical Report and Poster by July 25 **Submission Link Updated**
5 months agoHi @tereka, I am not sure about the presentation condition, but I think the process will stay similar to last year: those who win will be guaranteed, while those who did not will be selected (e.g. according to available time slots).
The task 2 and 3 confusion issue has been raised to relevant organizers.