Loading
Round 1: 11 days left #nlp #world_knowledge #llm #reasoning Weight: 1.0
4872
1163
443
385

This page corresponds to "Track 2: Shopping Knowledge Reasoning" in the Amazon KDD Cup 2024 Challenge".

🌟 Introduction

Online shopping has become an indispensable service in the lives of modern citizens. To provide better shopping experiences to users, machine learning has been extensively used to understand various entities in online shopping, such as queries, browse sessions, etc. to infer the user's search and shopping intentions. However, there are few studies that explore online shopping tasks under the multi-task, few-shot learning scenario. In practice, online shopping creates a massive multi-task learning problem that often involves a joint understanding of various shopping entities, such as products, attributes, queries, purchases, etc. Moreover, new shopping entities and tasks constantly emerge over time as a result of business expansion or new product lines, creating few-shot learning problems on these emerging tasks. 

Large language models (LLM) emerge as promising solutions to the multi-task, few-shot learning problem in online shopping. Many studies have underscored the ability of a single LLM to perform various text-related tasks with state-of-the-art abilities, and to generalize to unseen tasks with only a few samples or task descriptions. Therefore, by training a single LLM for all shopping-related machine learning tasks, we mitigate the costs for task-specific engineering efforts, and for data labeling and re-training upon new tasks. Furthermore, LLMs can improve the customers' shopping experiences by providing interactive and real-time shopping recommendations. 

This track, Shopping Knowledge Reasoning, aims to evaluate the model's ability to understand the complex implicit knowledge in the domain of online shopping, and to apply the knowledge to perform various types of reasoning. Various implicit knowledge exists in different categories of products and plays a crucial role in the browsing and shopping behaviors of customers. For example,

  • if a customer has just browsed Nike running shoes, he may either browse Adidas running shoes because "Adidas" offers similar products as "Nike", or browse Nike socks because "socks" are complementary to "running shoes";
  • after a customer purchases an iPhone 14, he may then purchase a lightning charging cable, which is compatible with iPhone 14. 

The knowledge implied in each product category differs significantly from each other, making it challenging for LLMs to not only understand, but also select the adequate knowledge to perform adequate reasoning. A real-world example is shown in Figure 1. 

Figure 1: Example of implicit knowledge in products and their compatibility. 

πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’» Tasks

This track focuses on understanding and applying the complex implicit knowledge to perform shopping-related reasoning, an example of which is shown in Figure 1. For a more fine-grained evaluation, this track is further divided into the following sub-skills. 

  • Numeric Reasoning: Calculation is often involved in shopping, such as calculating the total volume of a product pack. This sub-skill targets the model's ability to extract related numeric information from the contexts and perform numeric reasoning (e.g. calculations). 
  • Commonsense: Daily products occupy a large proportion of online shopping transactions. A strong commonsense reasoning ability is required for a model to recommend daily products according to users' personalized use cases. 
  • Implicit, Multi-hop Reasoning: This sub-skill requires models to understand the implicit, domain-specific knowledge in products and to infer multi-hop relations between shopping entities. For example, to answer the question in Figure 1, the model should perform the following reasoning: 
    • "Headphone Jack Adapter" should be used alongside wired headphones. 
    • "AirPods" are wireless headphones. 
    • "AirPods" are not compatible with "Headphone Jack Adapter". 
    • "Wired Noise Isolating Headphones" are wired and are thus compatible with a "Headphone Jack Adapter". 

πŸ—ƒ Datasets

The ShopBench Dataset is an anonymized, multi-task dataset sampled from real-world Amazon shopping data. Statistics of ShopBench in this track is given in Table 1. 

Table 1: Dataset statistics for Track 2: Shopping Knowledge Reasoning.

# Tasks # Questions # Products # Product Category # Attributes # Reviews # Queries
8 3117 ~1000 400 ~10 / 552

The few-shot development datasets (shared across all tracks) will be given in json format with the following fields. 

  • 'input_field': This field contains the instructions and the question that should be answered by the model. 
  • 'output_field': This field contains the ground truth answer to the question. 
  • 'task_type': This field contains the type of the task (Details in the next Section, "Tasks")
  • 'metric': This field contains the metric used to evaluate the question (Details in Section "Evaluation Metrics"). 

However, the test dataset (which will be hidden from participants) will have a different format with only two fields: 

  • 'input_field', which is the same as above. 
  • 'is_multiple_choice': This field contains a 'True' or 'False' that indicates whether the question is a multiple choice or not. The detailed 'task_type' will not be given to participants.   

πŸ’― Evaluation Metrics

Please see the detailed evaluation metrics here.

πŸš€ Baselines

To assist participants in making your first submission effortlessly, we have granted access to the ShopBench baseline. This setup utilises existing LLM models to generate answers in a zero-shot manner for a variety of questions. This resource features open-source LLMs, such as Vicuna-7B. We also include results of a proprietary LLM, Claude 2. We report results of Vicuna-7B and Claude in Table 2. 

Table 2: Baseline results of Vicuna-7B and Claude 2 on ShopBench Track 2: Shopping Knowledge Reasoning.

Models Track 2: Shopping Knowledge Reasoning
Vicuna-7B-v1.5 0.4453
Claude 2 0.6382
Amazon Titan 0.4500

With the results, we show that the challenge is manageable, in that open-source LLMs, without specific prompting techniques, can already achieve non-trivial performances. In addition, we observe a significant gap between open-source models (Vicuna-7B) and proprietary models (Claude 2), showing the potential room for improvement. We encourage participants to development effective solutions to close or even eliminate the gap. 

πŸ† Prizes

We prepare a prize pool of a total of $6,250 for this track. 

Cash Prizes

For this track, we assign the following awards. 

  • πŸ₯‡ First Place: $2,000
  • πŸ₯ˆ Second Place: $1,000
  • πŸ₯‰ Third Place: $500
  • Student Award: The best student team (i.e. all team members are students) will be awarded $750. 

In addition to cash prizes, the winning teams will also have the opportunity to present their work at the KDD Cup workshop.

πŸ’³ AWS Credits

We will award $500 worth of AWS credits to the 4th-7th teams in this track. 

πŸ“¨ Submission

Please see details of submission here.

The time limit for submissions to Track 2 in Phase 1 is 40 minutes.

πŸ“… Timeline

Please see the timeline here.

πŸ“± Contact 

Please use kddcup2024@amazon.com for all communication to reach the Amazon KDD cup 2024 team. 

Organizers of this competition are: 

  • Yilun Jin
  • Zheng Li
  • Chenwei Zhang
  • Xianfeng Tang
  • Haodong Wang
  • Mao Li
  • Ritesh Sarkhel
  • Qingyu Yin
  • Yifan Gao
  • Xin Liu
  • Zhengyang Wang
  • Tianyu Cao
  • Jingfeng Yang
  • Ming Zeng
  • Qing Ping
  • Wenju Xu
  • Pratik Jayarao
  • Priyanka Nigam
  • Yi Xu
  • Xian Li
  • Hyokun Yun
  • Jianshu Chen
  • Meng Jiang
  • Kai Chen
  • Bing Yin
  • Qiang Yang
  • Trishul Chilimbi

🀝 Acknowledgements

We thank our partners in AWS, Paxton Hall, for supporting with the AWS credits for winning teams and the competition.