AIcrowd | yilun_jin8 | Participants

Activity

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Mon

Wed

Fri

Challenge Categories

Challenges Entered

Completed

Commonsense Persona-Grounded Dialogue Challenge 2025

Sony Group Corporation

Create Context-Aware, Dynamic, and Immersive In-Game Dialogue

Latest submissions

See All

graded	291885	Mon, 30 Jun 2025 16:34:04
graded	291602	Sun, 29 Jun 2025 13:50:41
graded	291601	Sun, 29 Jun 2025 13:50:36

Completed

Latest submissions

See All

graded	286148	Thu, 29 May 2025 00:10:43
failed	286122	Wed, 28 May 2025 15:43:12
failed	286113	Wed, 28 May 2025 14:16:10

Completed

Meta Comprehensive RAG Benchmark: KDD Cup 2024

Latest submissions

No submissions made in this challenge.

Completed

Latest submissions

See All

failed	286113	Wed, 28 May 2025 14:16:10
graded	286062	Wed, 28 May 2025 05:03:20
failed	285972	Mon, 26 May 2025 15:12:39

Participant	Rating

Participant	Rating

yilun_jin8 has not joined any teams yet...

Meta CRAG - MM Challenge 2025

📢 Submit Your Technical Report and Poster by July 25 Submission Link Updated

5 months ago

Hi @tereka, I am not sure about the presentation condition, but I think the process will stay similar to last year: those who win will be guaranteed, while those who did not will be selected (e.g. according to available time slots).

The task 2 and 3 confusion issue has been raised to relevant organizers.

📢 Submit Your Technical Report and Poster by July 25 Submission Link Updated

5 months ago

I am not very sure if virtual presentation will be an option, but last year, we invited those who cannot come on site to submit a video to present.

Any updates on the results?

6 months ago

Hi @aerdem4

According to what I know, the organizers from meta have compiled a list of potential winners, and are pending the final confirmation from their leaders. It should not take too long (e.g. one day or so).

According to my experience from last year, most likely we will do both — an update on the final leaderboard, and a post in the discussion forum. However, since my experience last year does not involve human evaluation, I am not sure whether the results of human evaluation will be published.

🚨 Submission Selection Deadline: 23rd June 2025, 12:00 UTC (noon)

6 months ago

I think it now shows ‘graded’?

Why Submission #289819 is finished but the score not update on LeaderBoard?

6 months ago

I think it now shows ‘graded’.

Why did 289384, 289471 faild?

6 months ago

From the logs, it seems that both 289384 and 289471 failed due to timeouts, and I have no idea about 289697 (from the logs).

Regarding the re-execution of 289837 and 289855

6 months ago

Hi,

From the logs, it seems that both failed because you returned None on some questions, leading to a failure. However, it can be caused by some more subtle errors (which I am not sure of).

2025-06-18 14:27:26.258	
  File "/home/aicrowd/starter_kit/local_evaluation.py", line 575, in truncate_agent_responses
2025-06-18 14:27:26.258	
    encodings = self.tokenizer.encode_batch(agent_responses)
2025-06-18 14:27:26.258	
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-06-18 14:27:26.258	
TypeError: argument 'input': 'NoneType' object cannot be converted to 'Sequence'

Regarding the second question, we have added an additional 10 failed submissions per week. For example, if a team have submitted 10 failed ones and 5 successful ones, they can still submit 5.

Submission Status Change From "Generating" to "Prepare Generate" and Stuck

6 months ago

I think 289776 eventually failed due to timeout.
289706 somehow answered a None and caused the evaluator to fail
2025-06-18 12:41:05.661 __main__.AIcrowdError: Error from evaluator: argument 'input': 'NoneType' object cannot be converted to 'Sequence'
289611 somehow never started inference. We will re-run that and will still be counted valid for R2.
289670 and 289549 failed similarly as 289706.

Why submission #289097 #289035 #289096 failed?

6 months ago

289096 succeeded and was correctly graded.
289035 failed during score calculation. We will re-grade it.
289097 failed due to some network errors (and no models was successfully downloaded).

Why submission #289148 failed

6 months ago

I think it failed because some network error, as all huggingface model/data failed to download.

We will re-queue this submission, and it will be considered valid for R2 (if it passed).

Why Submission #289091&#289088 Failed?

6 months ago

Both failed due to timeout. Sorry for the late reply.

Important Update on Missing/Refusal Rate

6 months ago

Hi everyone in this thread,

The participants will not provide a solid limit of missing rate, because doing so would lead to aggressive overfitting of the limit.

Please consider building a ‘useful’ real-world question-answering model with reasonable answer rate instead of refusing anything — this is the main message from the organizers.

Why submission #289017 failed

6 months ago

Replied under your submission.

Why did Submission #288785 fail?

6 months ago

2025-06-16 21:39:42.812	
[rank0]:   File "/aicrowd-source/agents/batch_yanshi.py", line 180, in batch_generate_response
2025-06-16 21:39:42.812	
[rank0]:     ress, search_results, is_search = self.frist_time_get_answer(queries, images,message_histories)
2025-06-16 21:39:42.812	
[rank0]:   File "/aicrowd-source/agents/batch_yanshi.py", line 154, in frist_time_get_answer
2025-06-16 21:39:42.812	
[rank0]:     content = doc_item['page_snippet'][:1000]
2025-06-16 21:39:42.812	
[rank0]: KeyError: 'page_snippet'

This is the error for 288944.

Why Submission #288794 is failed?

6 months ago

2025-06-16 00:58:06.918
ValueError: The model’s max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (8016). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.

Commonsense Persona-Grounded Dialogue Chall-0431ae

Regarding the final ranking method

6 months ago

I don’t think so. Submissions will only be judged according to the ratings (and human evals). In addition, ties would be very rare, so I don’t think this can be possibly be used to break ties.
For previous challenges, upon the final evaluation, we will send out a form instructing participants to select submissions for final evaluation (e.g. 2 submissions). Most likely, we will do the same for this one.

[main page leaderboard ranks]

6 months ago

Hi,

For the first two questions, I cannot answer them at the moment.

For the third question, I don’t think so. We will not award according to the combined results.

Access to the OPEN_AI or GPU resource

6 months ago

Hi,

If you submit to the API track, you can assume that the openai api key is already put into OPENAI_API_KEY, and you can directly initialize an openai client.

Similarly, if you submit to the GPU track, you can directly use xxx.cuda() to use GPU.

[number of submissions per day and reset time]

6 months ago

Hi,

The number of submissions is counted per team, which applies to all members aggregated.

The limit refreshes in a rolling basis. A quota will be refreshed one day after each of your submission.

Task-Oriented Dialogue (Task 1)

[task1 failed]

6 months ago

I think it is a transient network error during evaluation. We will trigger a resubmission of it to see whether it solves the problem.

yilun_jin8 has not provided any information yet.

Notebooks

Create Notebook

Filters

Private

Notebooks

Create Notebook

Filters

Private

Badges

Activity

Challenge Categories

Challenges Entered

Commonsense Persona-Grounded Dialogue Challenge 2025

Latest submissions

Meta CRAG - MM Challenge 2025

Latest submissions

Meta Comprehensive RAG Benchmark: KDD Cup 2024

Latest submissions

Single-source Augmentation

Latest submissions

Meta CRAG - MM Challenge 2025

📢 Submit Your Technical Report and Poster by July 25 Submission Link Updated

📢 Submit Your Technical Report and Poster by July 25 Submission Link Updated

Any updates on the results?

🚨 Submission Selection Deadline: 23rd June 2025, 12:00 UTC (noon)

Why Submission #289819 is finished but the score not update on LeaderBoard?

Why did 289384, 289471 faild?

Regarding the re-execution of 289837 and 289855

Submission Status Change From "Generating" to "Prepare Generate" and Stuck

Why submission #289097 #289035 #289096 failed?

Why submission #289148 failed

Why Submission #289091&#289088 Failed?

Important Update on Missing/Refusal Rate

Why submission #289017 failed

Why did Submission #288785 fail?

Why Submission #288794 is failed?

Commonsense Persona-Grounded Dialogue Chall-0431ae

Regarding the final ranking method

[main page leaderboard ranks]

Access to the OPEN_AI or GPU resource

[number of submissions per day and reset time]

Task-Oriented Dialogue (Task 1)

[task1 failed]

Notebooks

Notebooks