Loading
0 Follower
0 Following
yilun_jin8

Badges

0
0
0

Activity

Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Mon
Wed
Fri

Challenge Categories

Loading...

Challenges Entered

Create Context-Aware, Dynamic, and Immersive In-Game Dialogue

Latest submissions

See All
graded 291885
graded 291602
graded 291601

Improve RAG with Real-World Benchmarks | KDD Cup 2025

Latest submissions

See All
graded 286148
failed 286122
failed 286113

Improve RAG with Real-World Benchmarks

Latest submissions

No submissions made in this challenge.

Generating answers using image-linked data

Latest submissions

See All
failed 286113
graded 286062
failed 285972
Participant Rating
Participant Rating
yilun_jin8 has not joined any teams yet...

Meta CRAG - MM Challenge 2025

๐Ÿ“ข Submit Your Technical Report and Poster by July 25 **Submission Link Updated**

5 months ago

Hi @tereka, I am not sure about the presentation condition, but I think the process will stay similar to last year: those who win will be guaranteed, while those who did not will be selected (e.g. according to available time slots).

The task 2 and 3 confusion issue has been raised to relevant organizers.

๐Ÿ“ข Submit Your Technical Report and Poster by July 25 **Submission Link Updated**

5 months ago

I am not very sure if virtual presentation will be an option, but last year, we invited those who cannot come on site to submit a video to present.

Any updates on the results?

6 months ago

Hi @aerdem4

According to what I know, the organizers from meta have compiled a list of potential winners, and are pending the final confirmation from their leaders. It should not take too long (e.g. one day or so).

According to my experience from last year, most likely we will do both โ€” an update on the final leaderboard, and a post in the discussion forum. However, since my experience last year does not involve human evaluation, I am not sure whether the results of human evaluation will be published.

๐Ÿšจ Submission Selection Deadline: 23rd June 2025, 12:00 UTC (noon)

6 months ago

I think it now shows โ€˜gradedโ€™?

Why Submission #289819 is finished but the score not update on LeaderBoard?

6 months ago

I think it now shows โ€˜gradedโ€™.

Why did 289384, 289471 faild?

6 months ago

From the logs, it seems that both 289384 and 289471 failed due to timeouts, and I have no idea about 289697 (from the logs).

Regarding the re-execution of 289837 and 289855

6 months ago

Hi,

From the logs, it seems that both failed because you returned None on some questions, leading to a failure. However, it can be caused by some more subtle errors (which I am not sure of).

2025-06-18 14:27:26.258	
  File "/home/aicrowd/starter_kit/local_evaluation.py", line 575, in truncate_agent_responses
2025-06-18 14:27:26.258	
    encodings = self.tokenizer.encode_batch(agent_responses)
2025-06-18 14:27:26.258	
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-06-18 14:27:26.258	
TypeError: argument 'input': 'NoneType' object cannot be converted to 'Sequence'

Regarding the second question, we have added an additional 10 failed submissions per week. For example, if a team have submitted 10 failed ones and 5 successful ones, they can still submit 5.

Submission Status Change From "Generating" to "Prepare Generate" and Stuck

6 months ago

I think 289776 eventually failed due to timeout.
289706 somehow answered a None and caused the evaluator to fail
2025-06-18 12:41:05.661 __main__.AIcrowdError: Error from evaluator: argument 'input': 'NoneType' object cannot be converted to 'Sequence'
289611 somehow never started inference. We will re-run that and will still be counted valid for R2.
289670 and 289549 failed similarly as 289706.

Why submission #289097 #289035 #289096 failed?

6 months ago

289096 succeeded and was correctly graded.
289035 failed during score calculation. We will re-grade it.
289097 failed due to some network errors (and no models was successfully downloaded).

Why submission #289148 failed

6 months ago

I think it failed because some network error, as all huggingface model/data failed to download.

We will re-queue this submission, and it will be considered valid for R2 (if it passed).

Why Submission #289091&#289088 Failed?

6 months ago

Both failed due to timeout. Sorry for the late reply.

Important Update on Missing/Refusal Rate

6 months ago

Hi everyone in this thread,

The participants will not provide a solid limit of missing rate, because doing so would lead to aggressive overfitting of the limit.

Please consider building a โ€˜usefulโ€™ real-world question-answering model with reasonable answer rate instead of refusing anything โ€” this is the main message from the organizers.

Why submission #289017 failed

6 months ago

Replied under your submission.

Why did Submission #288785 fail?

6 months ago

2025-06-16 21:39:42.812	
[rank0]:   File "/aicrowd-source/agents/batch_yanshi.py", line 180, in batch_generate_response
2025-06-16 21:39:42.812	
[rank0]:     ress, search_results, is_search = self.frist_time_get_answer(queries, images,message_histories)
2025-06-16 21:39:42.812	
[rank0]:   File "/aicrowd-source/agents/batch_yanshi.py", line 154, in frist_time_get_answer
2025-06-16 21:39:42.812	
[rank0]:     content = doc_item['page_snippet'][:1000]
2025-06-16 21:39:42.812	
[rank0]: KeyError: 'page_snippet'

This is the error for 288944.

Why Submission #288794 is failed?

6 months ago

2025-06-16 00:58:06.918
ValueError: The modelโ€™s max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (8016). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.

Commonsense Persona-Grounded Dialogue Chall-0431ae

Regarding the final ranking method

6 months ago

  1. I donโ€™t think so. Submissions will only be judged according to the ratings (and human evals). In addition, ties would be very rare, so I donโ€™t think this can be possibly be used to break ties.
  2. For previous challenges, upon the final evaluation, we will send out a form instructing participants to select submissions for final evaluation (e.g. 2 submissions). Most likely, we will do the same for this one.

[main page leaderboard ranks]

6 months ago

Hi,

For the first two questions, I cannot answer them at the moment.

For the third question, I donโ€™t think so. We will not award according to the combined results.

Access to the OPEN_AI or GPU resource

6 months ago

Hi,

If you submit to the API track, you can assume that the openai api key is already put into OPENAI_API_KEY, and you can directly initialize an openai client.

Similarly, if you submit to the GPU track, you can directly use xxx.cuda() to use GPU.

[number of submissions per day and reset time]

6 months ago

Hi,

The number of submissions is counted per team, which applies to all members aggregated.

The limit refreshes in a rolling basis. A quota will be refreshed one day after each of your submission.

Task-Oriented Dialogue (Task 1)

[task1 failed]

6 months ago

I think it is a transient network error during evaluation. We will trigger a resubmission of it to see whether it solves the problem.

yilun_jin8 has not provided any information yet.