Loading
4602
1163
443
288

This page corresponds to "Track 3: User Behavior Alignment" in the Amazon KDD Cup 2024 Challenge".

🌟 Introduction

Online shopping has become an indispensable service in the lives of modern citizens. To provide better shopping experiences to users, machine learning has been extensively used to understand various entities in online shopping, such as queries, browse sessions, etc. to infer the user's search and shopping intentions. However, few studies explore online shopping tasks under the multi-task, few-shot learning scenario. In practice, online shopping creates a massive multi-task learning problem that often involves a joint understanding of various shopping entities, such as products, attributes, queries, purchases, etc. Moreover, new shopping entities and tasks constantly emerge over time as a result of business expansion or new product lines, creating few-shot learning problems on these emerging tasks. 

Large language models (LLM) emerge as promising solutions to the multi-task, few-shot learning problem in online shopping. Many studies have underscored the ability of a single LLM to perform various text-related tasks with state-of-the-art abilities, and to generalize to unseen tasks with only a few samples or task descriptions. Therefore, by training a single LLM for all shopping-related machine learning tasks, we mitigate the costs for task-specific engineering efforts, and for data labeling and re-training upon new tasks. Furthermore, LLMs can improve the customers' shopping experiences by providing interactive and real-time shopping recommendations. 

This track, User Behavior Alignment, aims to evaluate the model's ability to understand and align with implicit and heterogeneous user behaviors on online shopping websites. User behavior modeling is of paramount importance in online shopping. However, user behaviors are highly heterogeneous, including browsing, purchasing, query-then-clicking, etc. Moreover, most of them are implicit and not expressed in texts, and even if we encode them in texts, it is still challenging for LLMs to understand their underlying implications --- they rarely exist in pre-training and fine-tuning data in the general domain. A real-world example is shown in Figure 1. 

Figure 1: Example of how heterogeneous user behaviors in online shopping pose challenges to LLMs. 

πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’» Tasks

This track focuses on understanding heterogeneous and implicit user behaviors in online shopping, an example of which is shown in Figure 1. For a more fine-grained evaluation, this track is further divided into the following sub-skills. 

  • Queries: Most online shopping starts with a query with a keyword (e.g. "Nike running shoes"). After that, a user may either initiate another query (e.g. "Nike Pegasus 39") or browse products and find what they want. Understanding queries is of paramount importance as it is the gateway for users to start their online shopping. 
  • Sessions: A session represents a sequence of user behaviors, including clicking, purchasing, querying, and adding to carts. A session contains crucial information about how a user's shopping intention begins, changes, and converts to a purchase. 
  • Purchases: Purchase is the final behavior of an online shopping experience and the result of multiple queries and browses. It would significantly boost a user's shopping experience if the model could help him skip the processes of searching for a product and directly make a purchase. 
  • Reviews and QAs: Customers interact with each other with both explicit and implicit feedback. Explicit feedback include question-answering where an actual user answers a question from a potential customer. Implicit feedback include user's votes to reviews that they consider helpful in making shopping decisions. As both types of feedbacks require waiting, it would significantly improve the shopping experiences if models could automatically provide such feedback to potential customers. 

πŸ—ƒ Datasets

The ShopBench Dataset is an anonymized, multi-task dataset sampled from real-world Amazon shopping data. Statistics of ShopBench in this track is given in Table 1. 

Table 1: Dataset statistics for Track 3: User Behavior Alignment.

# Tasks # Questions # Products # Product Category # Attributes # Reviews # Queries
15 3973 ~4800 / / 1600 ~3600

The few-shot development datasets (shared across all tracks) will be given in json format with the following fields. 

  • 'input_field': This field contains the instructions and the question that should be answered by the model. 
  • 'output_field': This field contains the ground truth answer to the question. 
  • 'task_type': This field contains the type of the task (Details in the next Section, "Tasks")
  • 'metric': This field contains the metric used to evaluate the question (Details in Section "Evaluation Metrics"). 

However, the test dataset (which will be hidden from participants) will have a different format with only two fields: 

  • 'input_field', which is the same as above. 
  • 'is_multiple_choice': This field contains a 'True' or 'False' that indicates whether the question is a multiple choice or not. The detailed 'task_type' will not be given to participants.      

πŸ’― Evaluation Metrics

Please see the detailed evaluation metrics here.

πŸš€ Baselines

To assist you in making your first submission effortlessly, we have granted access to the ShopBench baseline. This setup utilises existing LLM models to generate answers in a zero-shot manner for a variety of questions. This resource features open-source LLMs, such as Vicuna-7B. We also include results of a proprietary LLM, Claude 2. We report results of Vicuna-7B and Claude in Table 2. 

Table 2: Baseline results of Vicuna-7B and Claude 2 on ShopBench Track 3: User Behavior Alignment.

Models Track 3: User Behavior Alignment
Vicuna-7B-v1.5 0.4103
Claude 2 0.6322
Amazon Titan 0.5063

With the results, we show that the challenge is manageable, in that open-source LLMs, without specific prompting techniques, can already achieve non-trivial performances. In addition, we observe a significant gap between open-source models (Vicuna-7B) and proprietary models (Claude 2), showing the potential room for improvement. We encourage participants to development effective solutions to close or even eliminate the gap. 

πŸ† Prizes

We prepare a prize pool of a total of $6,250 for this track. 

Cash Prizes

For this track, we assign the following awards. 

  • πŸ₯‡ First Place: $2,000
  • πŸ₯ˆ Second Place: $1,000
  • πŸ₯‰ Third Place: $500
  • Student Award: The best student team (i.e. all team members are students) will be awarded $750. 

In addition to cash prizes, the winning teams will also have the opportunity to present their work at the KDD Cup workshop.

πŸ’³ AWS Credits

We will award $500 worth of AWS credits to the 4th-7th teams in this track. 

πŸ“¨ Submission

Please see details of submission here.

The time limit for submissions to Track 3 in Phase 1 is 60 minutes.

πŸ“… Timeline

Please see the timeline here.

πŸ“± Contact 

Please use kddcup2024@amazon.com for all communication to reach the Amazon KDD cup 2024 team. 

Organizers of this competition are: 

  • Yilun Jin
  • Zheng Li
  • Chenwei Zhang
  • Xianfeng Tang
  • Haodong Wang
  • Mao Li
  • Ritesh Sarkhel
  • Qingyu Yin
  • Yifan Gao
  • Xin Liu
  • Zhengyang Wang
  • Tianyu Cao
  • Jingfeng Yang
  • Ming Zeng
  • Qing Ping
  • Wenju Xu
  • Pratik Jayarao
  • Priyanka Nigam
  • Yi Xu
  • Xian Li
  • Hyokun Yun
  • Jianshu Chen
  • Meng Jiang
  • Kai Chen
  • Bing Yin
  • Qiang Yang
  • Trishul Chilimbi

🀝 Acknowledgements

We thank our partners in AWS, Paxton Hall, for supporting with the AWS credits for winning teams and the competition.