Loading
0 Follower
0 Following
wufanyou
FANYOU WU

Organization

Amazon

Location

Seattle, US

Badges

1
1
0

Connect

Activity

May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Mon
Wed
Fri

Ratings Progression

Loading...

Challenge Categories

Loading...

Challenges Entered

Improve RAG with Real-World Benchmarks

Latest submissions

See All
graded 254612
failed 254609
failed 254607

What data should you label to get the most value for your money?

Latest submissions

See All
graded 173244

Latest submissions

See All
graded 195825
graded 195824
graded 195801

3D Seismic Image Interpretation by Machine Learning

Latest submissions

No submissions made in this challenge.

Play in a realistic insurance market, compete for profit!

Latest submissions

No submissions made in this challenge.

Multi Agent Reinforcement Learning on Trains.

Latest submissions

No submissions made in this challenge.

Evaluating RAG Systems With Mock KGs and APIs

Latest submissions

No submissions made in this challenge.

Enhance RAG systems With Multiple Web Sources & Mock API

Latest submissions

No submissions made in this challenge.
Participant Rating
Participant Rating
  • TLab Seismic Facies Identification Challenge
    View
  • ETS-Lab ESCI Challenge for Improving Product Search
    View
  • ETSLab Meta Comprehensive RAG Benchmark: KDD Cup 2024
    View

Meta Comprehensive RAG Benchmark: KDD Cup 2

Phase 1 has released the dataset , and how to appy a cut-off to limit Phase 2?

8 days ago

Hi organizers,

It is apparently there are two teams in Track 1 now (April 30th), use the public testset [1] to obtain nearly full score (~0.98). I am wondering in this senario, how to apply a cut-off in Phase 2? Every participant just need to upload public testset and obtain the similar full score. Is there still potential cut-off?

[1] What does `split` field mean? - #3 by graceyx.yale

Best
Fanyou

Regarding to maxiumn number tokens of response for llama 3

13 days ago

@aicrowd_team Yes. I understand that the code has already had this tokenzier. But Llama 3 had different vocab size (128K vs 32K). In some cases, the output number of tokens will be smaller than that of llama 2 if the output texts are the same. In terms of the model performance, LLama 3 is better (in the report) and I foresee people might use it. So I suggest if we can replace the current tokenzier for truncating predictions to Llama 3’s.

Regarding to maxiumn number tokens of response for llama 3

14 days ago

I want to raise organziers attention that Llama 3 had a larger vocabulary size (128K) comparing to llama 2 (32K). So we need to clear define in the rule that what tokenizer is used to truncate the response (previously the code used llama 2 tokenzier).

Best
Fanyou

Are we allowed to use LLama 3?

20 days ago

Hi Organziers,

Meta has introudced Llama 3 and is avalible at huggingface. I am wondering if we can use it for the competition. The Llama 3 - 8B model might be a good choice.

Best
Fanyou

Can we use other LLM at training stage?

About 1 month ago

Hi Organizers,

I want to understand if we can use other LLM (not LLAMA2 family) during the traning stage, specifically, used for RLHF and Data Generation.

Below is the raw request for model:

This KDD Cup requires participants to use Llama models to build their RAG solution. Specially, participants can use or fine-tune the following 4 Llama 2 models from https://llama.meta.com/llama-downloads:

  • llama-2-7b
  • llama-2-7b-chat
  • llama-2-70b
  • llama-2-70b-chat

Best
Fanyou

Amazon KDD Cup '23: Multilingual Recommendation Ch

Eligiblity for the attendence

About 1 year ago

Hi, Organizer.

I am currently an Amazon employee but does not related to Amazon Search. Each year, I used to attend the KDD Cup to learn and practice and won several top places for KDD CUP before. I am wondering if I am eligible to attend it and eligble for the prize. If I am eligiable to attend but not for the prize, and I am luck to get top places, if it is possible to keep my ranking without granting any cash prize?

The rule is writen as :

People who, during the Challenge Period, are directors, officers, employees, interns, and contractors (β€œPersonnel”) of Sponsor, its parents, subsidiaries, affiliates, and their respective advertising, promotion and public relations agencies, representatives, and agents (collectively, β€œChallenge Entities”), immediate families members of such Personnel (parents, siblings, children, spouses, and life partners of each) and members of the households of such Personnel (whether related or not) are ineligible to win a prize in this Challenge. Sponsor reserves the right to verify eligibility and adjudicate any eligibility dispute at any time.

Best
Fanyou Wu

ESCI Challenge for Improving Product Search

πŸ‘‘ Final Winners Announcement πŸ‘‘

Almost 2 years ago

Hi, Mohanty

Thanks to the whole Aicrowd team and Amazon search team to organize this year’s KDD CUP. I have a question about the KDD workshop. Do we need to or could we submit a paper for the KDD workshop? As the workshop paper deadline is also Aug 1st. If it is possible to finalize the ranking in advance (e.g, 2-3 days before the paper deadline). And I believe the current rank will not change anymore.

Best
Fanyou

[ETS-Lab] Our solution

Almost 2 years ago

That task one feature will probably work on the private dataset as well that’s why use used it. If you check the product list in task two and task one. You will find a special pattern of the product order. In general, the product list is sorted as a training set, private set, and public set. Another reason why this feature will work is that the product-to-example ratio is close to 1 which means most products are used once.

There is another way to construct this leak feature that checks whether the query-product pair is in task 1 public dataset. This one will definitely fail in the private set as we cannot access that information.

Note that the evaluation service used V100 equipped with the tensor core. Transfer it to onnx fp16 help a lot for the speed. For example, our one unoptimized debertaV3-base model takes about 90 mins to do the inference with a single 2080Ti GPU locally but only 35-40 mins to do the inference online for 2 debertaV3 models (2 folds).

[ETS-Lab] Our solution

Almost 2 years ago

Thanks the AIcrowd team and the Amazon search team to organize this extensive competition. Finally, this game is ended. Our team learned a lot here and we believe this memorable period will help a lot in our future. Here we generally introduce our solution for this competition.

General solution

  • We trained 3 cross encoder models (DebertaV3, CocoLM, and Bigbird) for each language which differs in the pertained models, training method (e.g., knowledge distillation), and data splitting. In total, six identical models (2 folds x 3 models) for each language are used to produce the initial prediction (4 class probability) of the query-product pair. Use those models only, the public set score for task 2 is around 0.816.

  • For Task 1, we used the output 4 class probability with some simple features to train a lightgbm model, calculate the expected gain (P_e*1 + P_s*0.1 + P_c*0.01), and sort the query-product list by this gain. This is method is slightly better than using LambdaRank directly in LightGBM.

  • For task 2 and Task 3, we used lightgbm to fuse those predictions with some important features. Most important features are designed based on the potential data leakage from task 1 and the behavior of the query-product group:

    • The stats (min, medium, and max) of the cross encoder output probability grouped by query_id (0.007+ in Task 2 Public Leaderboard)
    • The percentage of product_id in Task 1 product list grouped by query_id (0.006+ in Task 2 Public Leaderboard)

Small modification towards Cross Encoder architecture

  • As the product context has multiple fields (title, brand, and so on), we use neither the cls token nor mean (max) pooling to get the latent vector of the query-product pair. Instead, we concatenate the hidden states of a predefined token (query, title, brand color, etc.). The format is:
    [CLS] [QUERY] <query_content> [SEP] [TITLE] <title_content> [SEP] [BRAND] <brand_content> [SEP] ...
    
    where [TEXT] is the special token and <text_content> is the text contents.

Code submission speed up

  1. Pre-process product token and save it as an HDF5 file.
  2. Transfer all models to ONNX with FP16 precision.
  3. Pre-sort the product id to reduce the side impact of batch zero padding.
  4. Use a relatively small mini-batch size when inference (batch size = 4).

You can find our training code here and code submission here.

Advertisement

Currently, I am seeking either a machine learning engineer or a research scientist job in the US. Collaborated with my friend Yang @Yang_Liu_CTH, I won some champions and runner-ups in many competitions including the champion of the KDD CUP 2020 reinforcement learning track. You can email me directly or go to my personal website for more details.

Best
Dr. Wu, Fanyou
Postdoc @ Purdue University

πŸ“† Deadline Extension to 20th July && ⏳ Increased Timeout of 120 mins

Almost 2 years ago

Endless deadline. Let me call it aliveline. :face_vomiting:

Calling on the organizer team to ban using external data in online code submission

Almost 2 years ago

There is another way for the fairness that request all the teams to publish their external data.

Calling on the organizer team to ban using external data in online code submission

Almost 2 years ago

If manually code review could be done. Then I support to ban external data as our team might benefit from it. But my stand of view is still from the rule itself. It is really not a wise idea to change anything at this stage.

Calling on the organizer team to ban using external data in online code submission

Almost 2 years ago

Although our team does not use any external data, we do not support change the rule any more. Keeping change rule makes this the competition like a joke and make all of us tired!

Please do not change any rule again and I believe that the host have promised before in the deadline extension poster. @mohanty

Note that different from task 1 that @TransiEnt focus on, task 2 required many efforts to made the code more efficient. So we applied pre-process to tokenizer all products. Beside, the product id itself is also a feature and we put it to the transformers. If the product is disputed. All of my model need to be retrained and I have no enough computing resources. So it is impossible to ban an product id here.

The only way to ban external data is to inspect code afterwards which become extremely hard for the host to do.

Best
Fanyou

πŸ“† Deadline Extension to 20th July && ⏳ Increased Timeout of 120 mins

Almost 2 years ago

I create an Unoffical vote for ths the extension of deadline and timeouts . Please share your option there. I wish the orginzer could head something from the poll.

Unoffical vote for the extension of deadline and timeouts

Almost 2 years ago

Here is the unofficial poll for the extension of deadline and timeouts.

  • Agree with the extent deadline
  • Disagree with the extent deadline
  • Agree to increase timeouts to 120 mins
  • Disagree to increase timeouts to 120 mins

0 voters

πŸ“† Deadline Extension to 20th July && ⏳ Increased Timeout of 120 mins

Almost 2 years ago

Hi,

Thanks for providing a good challenge for us. But this news is really bad for me as our team are working on the submission which meet all the rules that requested including deadline and submission time limits. 30 minutes, 90 minutes and 120 minutes could yield significantly different solution and the changes of deadline will cause some schedule conflicts. I feel tortured when the rules changes every time. The arbitrary changes show no respect to those people who are working hard in this challenge.

Personally, I wish to keep the original schedule and time limits but I also respected other teams and host stand and opinions.

If possible, could we make a poll to let all active teams vote?

I create an Unoffical vote for ths the extension of deadline and timeouts . Please share your option there. I wish the organizer could hear something from the poll.

Best
Fanyou Wu

πŸš€ Code Submission Round Launched πŸš€

Almost 2 years ago

Could you also check my submission? I believe there is an unusual behavior of some hosting services. The code passed the public test set and soon failed for the private set. I also observed other participants’ submissions near the same time and all of them failed.
submission_hash : 540adaa2989b1c62dffc48659400db2cc0a13989.

@shivam

[Updated] Customize Dockerfile for both phase

Almost 2 years ago

There are multiple ways to add the repo path to the environment. I create the utils as a package with init.py and run.py.

utils:
__init__.py
other.py
run.py

Then when I copy the whole utils into starter_kit then it will keep the same structure. Also, note that the docker copy command I used here does not keep the directory structure. So if you have a hierarchical file structure, then it will not keep the structure. You also need to use relative path import for this case.

[Updated] Customize Dockerfile for both phase

Almost 2 years ago

No, I used docker to build images. But as I pip many packages which I guess might fit the requirement of the aicrowd. But I am not for sure which are necessary one.

[Updated] Customize Dockerfile for both phase

Almost 2 years ago

Now, I guess it is because some packages are necessary but I do not exactly what are they. I guess some jupyter-notebook-related lib is needed here. since the original command (dockerfile) is based on repo2docker.

wufanyou has not provided any information yet.