AIcrowd | Task 3: Next Product Title Generation

Phase 1: Completed

Phase 2: Completed Weight: 1.0

AIcrowd &

Amazon Search

14.9k

2872

451

1981

Phase 2 is now live, test sets for Phase 2 can be found on the Resources Tab

🚨 Task 3 Test Sessions were updated on March 22nd - Make sure to use the latest test data for Task 3. 🚨

Make your first submission - Getting Started Notebook 🚀

✨ Introduction

Modeling customer shopping intentions is a crucial task for e-commerce stores, as it directly impacts user experience and engagement. Accurately understanding what a customer is searching for, such as whether they are looking for electronics or groceries with search query “apple”, is essential for providing personalized recommendations. Session-based recommendation, which utilizes customer session data to predict their next purchase, has become increasingly popular with the development of data mining and machine learning techniques. However, there are few studies that have explored session-based recommendation under real-world multilingual and imbalanced scenarios.

To address this gap, we present the "Multilingual Shopping Session Dataset," a dataset consisting of millions of user sessions from six different locales, where the major languages of products are English, German, Japanese, French, Italian, and Spanish. The dataset is imbalanced, with fewer products in French, Italian, and Spanish than in English, German, and Japanese. With this data, we introduce three different tasks:

predicting the next engaged product for sessions from English, German, and Japanese
predicting the next engaged product for sessions from French, Italian, and Spanish, where transfer learning techniques are encouraged
predicting the title for the next engaged product

We hope this dataset and competition will encourage the development of multilingual recommendation systems, which can enhance personalization and understanding of global trends and preferences. By promoting diversity and innovation in data science, this competition aims to provide practical solutions that benefit customers worldwide. The dataset will be publicly available to the research community, and standard evaluation metrics will be used to assess model performance.

🗃️ Dataset

The dataset released is anonymized and not representative of the production characteristics.

The Multilingual Shopping Session Dataset is a collection of anonymized customer sessions containing products from six different locales, namely English, German, Japanese, French, Italian, and Spanish. It consists of two main components: user sessions and product attributes. User sessions are a list of products that a user has engaged with in chronological order, while product attributes include various details like product title, price in local currency, brand, color, and description.

The dataset has been divided into three splits: train, phase-1 test, and phase-2 test. For Task 1 and Task 2, the proportions for each language are roughly 10:1:1. For Task 3, the number of samples in phase-1 test and phase-2 test are fixed to 10,000. All three tasks share the same train set, while their test sets have been constructed according to their specific objectives. Task 1 uses data from English, German, and Japanese, while Task 2 uses data from French, Italian, and Spanish. Participants in Task 2 are encouraged to use transfer learning to improve their system's performance on the test set. For Task 3, the test set includes products that do not appear in the training set, and participants are asked to generate the title of the next product based on the user session.

Table 1 summarizes the dataset statistics, including the number of sessions, interactions, products, and average session length. As part of the KDD Cup competition, the dataset will be made publicly available, with each product identified by a unique Amazon Standard Identification Number (ASIN), making it easy to extract more information from the web. Participants are free to use external sources of information to train their systems, such as public datasets and pre-trained language models, but must declare them when describing their systems beyond the provided dataset.

Language (Locale)	# Sessions	# Products (ASINs)
German (DE)	1111416	513811
Japanese (JP)	979119	389888
English (UK)	1182181	494409
Spanish (ES)	89047	41341
French (FR)	117561	43033
Italian (IT)	126925	48788

Table 1: Dataset statistics

In addition, we list the column names and their meanings for product attribute data:

locale: the locale code of the product (e.g., DE)
id: a unique for the product. Also known as Amazon Standard Item Number (ASIN) (e.g., B07WSY3MG8)
title: title of the item (e.g., “Japanese Aesthetic Sakura Flowers Vaporwave Soft Grunge Gift T-Shirt”)
price: price of the item in local currency (e.g., 24.99)
brand: item brand name (e.g., “Japanese Aesthetic Flowers & Vaporwave Clothing”)
color: color of the item (e.g., “Black”)
size: size of the item (e.g., “xxl”)
model: model of the item (e.g., “iphone 13”)
material: material of the item (e.g., “cotton”)
author: author of the item (e.g., “J. K. Rowling”)
desc: description about a item’s key features and benefits called out via bullet points (e.g., “Solid colors: 100% Cotton; Heather Grey: 90% Cotton, 10% Polyester; All Other Heathers …”)

🕵️‍♀️ Tasks

The main objective of this competition is to build advanced session-based algorithms/models that directly predicts the next engaged product or generates its title text. The three tasks we proposed are:

Next Product Recommendation
Next Product Recommendation for Underrepresented Languages/Locales
Next Product Title Generation

Note that the three tasks share the same training set. However, the objectives of three tasks are different. Details of each tasks are as follows:

Task 3

Task 3 requires participants to predict the title of the next product that a customer will engage with, based on their session data. Unlike Tasks 1 and 2, which focus on recommending existing products, predicting new or "cold-start" products presents a unique challenge. The generated titles have the potential to improve various downstream tasks, including cold-start recommendation and navigation. The test set for Task 3 includes data from all six locales, and participants should submit a single parquet file containing the generated titles for each row/session in the input file. The title should be saved in string format.

Input example:

locale	example_session
UK	[product_1, product_2, product_3]
DE	[product_4, product_5]

Output example:

next_item_title
"toilet paper tube"
"bottle of ink"

The evaluation metrics for this task is bilingual evaluation understudy (BLEU). BLEU is a metric used to evaluate the quality of natural language generation, by comparing generation candidate to one or more references. BLEU is computed using a couple of ngram modified precisions. Specifically,

BLEU=BP⋅exp⁡(∑n=1Nwnlog⁡pn)

where BP is the brevity penalty. N is the maximum n-gram length used for calculating precision scores. wn is the weight assigned to each n-gram precision score. exp is the exponential function. pn is the precision score for each n-gram. The precision score pn is the ratio of the number of n-grams in the candidate that appear in any of the reference, to the total number of n-grams in the candidate. Mathematically, pn is calculated as follows:

pn=∑s∈Cmin(countnC(s),maxr∈Rcountnr(s))∑s∈CcountnC(s)

where countnC(s) is the count of n-gram s in the candidate C. countnr(s) is the count of n-gram s in the reference r∈R, where R are a set of references. The brevity penalty (BP) is a correction factor that penalizes the generation candidate that is too short compared to the reference. The brevity penalty is calculated as follows:

BP={1,if Lc>Lrexp⁡(1−LrLc),ifLc≤Lr

where Lc is the length of the generation and Lr is the length of the shortest reference. In general, the BLEU score ranges from 0 to 1, with higher scores indicating better generation.

We set N=4 (i.e., BLEU-4) with wn=1/N for this task.

Leaderboard & Evaluations

Each task will have its separate leaderboard, which will be maintained throughout the competition for models evaluated on the public test set. At the end of the competition, a private leaderboard will be maintained for models evaluated on the private test set. This latter leaderboard will be used to decide the winners for each task in the competition. The leaderboard on the public test set is meant to guide participants on their model performance and compare it with other participants. [placeholder content]

📅 Timeline

Start Date: 15th March, 2023
End Date: 9th June, 2023 00.00 UTC
Winner Announcement: 14th June, 2023

🏆 Prizes

There are prizes for all three tasks. For each of the task, top three positions on leaderboard win the following cash prize.

🥇 First place : $4,000
🥈 Second place : $2,000
🥉 Third place : $1,000

🪙 AWS Credits

For each of the three tasks, the teams/participants that finish between the 4th and 10th position on the leaderboard will receive AWS credit worth $500.

🏛️ KDD Cup Workshop

KDD Cup is an annual data mining and knowledge discovery competition organised by the Association for Computing Machinery's Special Interest Group on Knowledge Discovery and Data Mining (ACM SIGKDD). The competition aims to promote research and development in data mining and knowledge discovery by providing a platform for researchers and practitioners to share their innovative solutions to challenging problems in various domains.

📱 Contact

Have queries, feedback or looking for teammates, drop a message on (AIcrowd Community)[https://discourse.aicrowd.com/c/kdd-cup-2023/2696]. Please use kddcup2023@amazon.com for all communication to reach the Amazon KDD cup 2023 team.

Organizers of this competition are (in alphabetical order):

Bing Yin
Chen Luo
Haitao Mao
Hanqing Lu
Haoming Jiang
Jiliang Tang
Karthik Subbian
Monica Cheng
Suhang Wang
Wei Jin
Xianfeng Tang
Yizhou Sun
Zhen Li
Zheng Li
Zhengyang Wang

🤝 Acknowledgements

We thank our partners in AWS, Paxton Hall, for supporting with the AWS credits for winning teams and the competition.