Loading
1 Follower
0 Following
good-good-study

Location

CN

Badges

2
1
0

Activity

Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
Mon
Wed
Fri

Ratings Progression

Loading...

Challenge Categories

Loading...

Challenges Entered

Improve RAG with Real-World Benchmarks

Latest submissions

No submissions made in this challenge.

Latest submissions

No submissions made in this challenge.

Interactive embodied agents for Human-AI collaboration

Latest submissions

No submissions made in this challenge.

Latest submissions

See All
graded 195840
graded 195832
graded 195809
Participant Rating
amiruddin_nagri 0
Participant Rating

Meta Comprehensive RAG Benchmark: KDD Cup 2

About submission times

29 days ago

Each team can submit up to 15 times for all 3 tracks together over the challenge

Is it means that I can only submit 5 times for each tracks if I want to participate all three tracks? Seems unreasonable.

ESCI Challenge for Improving Product Search

My solution (good-good-study, day-day-up)

Over 1 year ago

  1. I didn’t consider these situations too much, but separated the text with white space characters and punctuation (for Japanese, I uses mecab as a word segmentation tool). I don’t think this situation has much impact. We just want to use them as the input of the model, so they doesn’t need to be very accurate.

  2. I am more familiar with TFIDF, so I choose TFIDF. I haven’t studied and tested the effect of BM25 in detail.

  3. One of the characteristics of TFIDF is its fast speed. I can include the process of extracting keywords in PyTorch Dataset and DataLoader. Using more complex methods to extract keywords may bring better results, but it will also produce huge consumption. In addition, I consider training an embedding for query and the corresponding product set in advance, and then taking it as the feature of query, but the effect is not good.

My solution (good-good-study, day-day-up)

Almost 2 years ago

Basic Solution

All my models are based on the infoxlm large model. I concat the training set of Task1 and task2 as a new training set after de duplication. Then all next three tasks use the same model trained on the new training set. Finally, I used 8 models on Task1 and 4 models on task2 and task3.

The output of the model can be submitted to different tasks after different processing:

  • Task1: order the product by \hat P_{exact} +\hat P_ {substitute} * 0.1 + \hat P_ {completion} * 0.01. The class weight is the gain of four labels.

  • Task2: take the label with the highest prediction probability as the prediction result

  • Task3: check whether \hat P_{substitute} is greater than 0.5 and the prediction result is obtained

Keywords of Query

The query is short text, which is very unfavorable for understanding the meaning of query. Therefore, I take the titles of all products corresponding to the query as a document, and then use TFIDF to extract keywords. Also, I get the keywords of product_bullet_point and product_description for each query. In this way, the extracted keywords can be used as the feature of query. And it can be put into the input text.

In addition, I also add the brand and color names of all products with the same query to the model. (In intuition, if there is a word in query that represents a brand, but we don’t clearly point it out, it will affect the prediction results of some goods that are not of this brand.)

This idea has a great gain for task2 and task3 models, I get a improvement of more than 0.01 from it (task2). With this idea, my task2 score of single model at public leaderboard is 0.821 (without post-processing).

But I don’t get much gain in task1. I think it is because Task1 focuses on the ordering of different products with the same query, so the features of query are not important, and the features of products are more important.

Self Distillation

I get the prediction probability on the whole training set through 10-fold cross validation on the training data, and take the mean of prediction probability and the true label probability as a soft label, and then use this soft label for model training. For example, suppose the prediction probability of one sample is (0.4, 0.3, 0.2, 0.1), and the true label is 0, the we have a soft label (0.7, 0.15, 0.1, 0.05), and then I use it for model training.

Such an approach can significantly enhance the robustness of the model and overcome the impact of noise data. With with this approach, my task2 score of single model at public leaderboard is 0.824 (without post-processing).

However, this approach will affect the effect of model ensemble. Using four models can only improve the result to 0.826. If there are we use many models, this method does not seem to bring significant gain.

Post Processing

In the last several days of the competition, I found that the threshold has a great impact on task3, and further found that task2 score can also be significantly improved by increasing the probability of special label. After exploring, I think there may be two part of marking data, one of which is task1 data, and all of the data is used as task2 and task3 data. In this way, after the leak is removed from the test set of task2, the distribution of the data set will change significantly, so that we can improve the score through post-processing. After discovering this, I improved my score on task2 to 0.830 through simple post-processing rules.

Later, I used a lightgbm model to replace the manual design post-processing rules, and added the feature of the sample index (the data is not shuffled, which is a small leak) and the feature of whether the sample appeared in the Task1 public test set. This improve my score to 0.832.

External data

I crawled the titles and comments of the products in English, Spanish, Japanese and Chinese, as well as the pictures of the goods from Amazon. But maybe because I used them in a wrong way, I only get a gain of 0.001 through the crawled title. I may publish these data later. Welcome to explore how comment data and image data can help improve search ranking.

Model acceleration

  • pytorch amp

  • I read 1024 samples from the dataloder at a time, and order them according to the number of non padded tokens. Then the 1024 samples are splited into 16 pieces. In this way, shorter texts can have shorter prediction time.

  • For model ensemble, not all models need to make complete predictions. For example, suppose we have four models, and the mean prediction probability of the first three model is (0.7, 0.1, 0.1, 0.1) . Then the fourth model does not need to predict this sample, because its prediction results can not change the final prediction anyway. Even if the prediction probability of the fourth model for this sample is (0.0, 1.0, 0.0, 0.0) , the mean prediction probability of the four models is still (0.525, 0.325, 0.075, 0.075) , and the final prediction result is still the first label. Based on this idea, we can reduce many unnecessary predictions in the prediction of the third and fourth models.

πŸ“† Deadline Extension to 20th July && ⏳ Increased Timeout of 120 mins

Almost 2 years ago

@mohanty I notice that there are some new submissions after competition deadline. Is the deadline extended again?

πŸ“† Deadline Extension to 20th July && ⏳ Increased Timeout of 120 mins

Almost 2 years ago

The submission pipeline isn’t broken, we just need wait in queue for some hours. Don’t extend it again.

πŸ“† Deadline Extension to 20th July && ⏳ Increased Timeout of 120 mins

Almost 2 years ago

Although I know that we have been unable to change your decision, I still want to tell you how your decision hurt many participants who have worked very hard from the beginning. We followed all the rules, tried to submit and have been waiting for the deadline. Now in the last two days of the game, you told us that we need to continue to fight for this game for another six days. It’s like telling a marathon runner who has run 40km that the finish line is 50km. Although the final ranking may not change much, we need to spend a lot of extra energy.

πŸ“† Deadline Extension to 20th July && ⏳ Increased Timeout of 120 mins

Almost 2 years ago

Totally agree, I felt very tortured when I saw this news. Both extended schedule and increased timeout are bad news for me.

[Updated] Customize Dockerfile for both phase

Almost 2 years ago

I test torch-1.12.0+cu113 and torch-1.12.0+cu116 on my 450 driver machine, both of them can use gpu normally.

In [1]: import torch

In [2]: !nvidia-smi | head -n 4
Wed Jul  6 11:06:06 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.03   Driver Version: 450.119.03   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+

In [3]: torch.version.cuda
Out[3]: '11.6'

In [4]: torch.zeros((2, 2)).cuda()
Out[4]:
tensor([[0., 0.],
        [0., 0.]], device='cuda:0')

[Updated] Customize Dockerfile for both phase

Almost 2 years ago

Really thank you for your exploring and sharing. And I have some comment which may be helpful for someone.

Exactly, I find that 450 driver can support all 11.x cuda. And if you only use pytorch, you don’t need to install cuda by yourself since pytorch has packed a cuda (that is why pytorch has a so large whl file).

πŸš€ Code Submission Round Launched πŸš€

Almost 2 years ago

pandas 1.4.x only support python 3.8+. Your problem is probably because the python version in docker mirror is less than 3.8.

πŸš€ Code Submission Round Launched πŸš€

Almost 2 years ago

@mohanty @shivam Is the 30min time constraints means that we have 30min to run our prediction code? My submission on task2 always fail without any error message. I think this may cause by timeout, but the time between the failure and the log aicrowd_evaluations.evaluator.client:register:168 - connected to evaluation server is always around 27min, which is less than 30 min.

πŸš€ Code Submission Round Launched πŸš€

Almost 2 years ago

@xuange_cui I have met similar problem before. And after I disabled the debug mode, I can submit normally.

Is there a limit on the total number of submissions when merging teams?

Almost 2 years ago

Hi @shivam , I still get a " no submission slots remaining for today" error. Is this because I just created a team?

EDIT: I resubmitted it again. And this time it’s OK.

Is there a limit on the total number of submissions when merging teams?

Almost 2 years ago

Hi, shivam. I got a " Submission failed : The participant has no submission slots remaining for today." error on my third code submission today. Don’t we have 5 submission times each day?

πŸš€ Code Submission Round Launched πŸš€

Almost 2 years ago

Hi, what is the exact dataset size of private dataset for each task ?

F1 Score for ranking task 2 and 3

Almost 2 years ago

For multiclass classification, micro-f1, micro-precision, micro-recall and accuracy are always the same, since we always recall one and only one label for each sample.

πŸš€ Datasets Released & Submissions Open πŸš€

About 2 years ago

Just click the download button, and copy the link from the browser download content page, it should be a aws link.

good-good-study has not provided any information yet.