Modeling customer shopping intentions is a crucial task for e-commerce stores, as it directly impacts user experience and engagement. Accurately understanding what a customer is searching for, such as whether they are looking for electronics or groceries with search query “apple”, is essential for providing personalized recommendations. Session-based recommendation, which utilizes customer session data to predict their next purchase, has become increasingly popular with the development of data mining and machine learning techniques. However, there are few studies that have explored session-based recommendation under real-world multilingual and imbalanced scenarios.
To address this gap, we present the "Multilingual Shopping Session Dataset," a dataset consisting of millions of user sessions from six different locales, where the major languages of products are English, German, Japanese, French, Italian, and Spanish. The dataset is imbalanced, with fewer products in French, Italian, and Spanish than in English, German, and Japanese. With this data, we introduce three different tasks:
- predicting the next engaged product for sessions from English, German, and Japanese
- predicting the next engaged product for sessions from French, Italian, and Spanish, where transfer learning techniques are encouraged
- predicting the title for the next engaged product
We hope this dataset and competition will encourage the development of multilingual recommendation systems, which can enhance personalization and understanding of global trends and preferences. By promoting diversity and innovation in data science, this competition aims to provide practical solutions that benefit customers worldwide. The dataset will be publicly available to the research community, and standard evaluation metrics will be used to assess model performance.
The dataset released is anonymized and not representative of the production characteristics.
The Multilingual Shopping Session Dataset is a collection of anonymized customer sessions containing products from six different locales, namely English, German, Japanese, French, Italian, and Spanish. It consists of two main components: user sessions and product attributes. User sessions are a list of products that a user has engaged with in chronological order, while product attributes include various details like product title, price in local currency, brand, color, and description.
The dataset has been divided into three splits: train, phase-1 test, and phase-2 test. For Task 1 and Task 2, the proportions for each language are roughly 10:1:1. For Task 3, the number of samples in phase-1 test and phase-2 test are fixed to 10,000. All three tasks share the same train set, while their test sets have been constructed according to their specific objectives. Task 1 uses data from English, German, and Japanese, while Task 2 uses data from French, Italian, and Spanish. Participants in Task 2 are encouraged to use transfer learning to improve their system's performance on the test set. For Task 3, the test set includes products that do not appear in the training set, and participants are asked to generate the title of the next product based on the user session.
Table 1 summarizes the dataset statistics, including the number of sessions, interactions, products, and average session length. As part of the KDD Cup competition, the dataset will be made publicly available, with each product identified by a unique Amazon Standard Identification Number (ASIN), making it easy to extract more information from the web. Participants are free to use external sources of information to train their systems, such as public datasets and pre-trained language models, but must declare them when describing their systems beyond the provided dataset.
|Language (Locale)||# Sessions||# Products (ASINs)|
Table 1: Dataset statistics
In addition, we list the column names and their meanings for product attribute data:
- locale: the locale code of the product (e.g., DE)
- id: a unique for the product. Also known as Amazon Standard Item Number (ASIN) (e.g., B07WSY3MG8)
- title: title of the item (e.g., “Japanese Aesthetic Sakura Flowers Vaporwave Soft Grunge Gift T-Shirt”)
- price: price of the item in local currency (e.g., 24.99)
- brand: item brand name (e.g., “Japanese Aesthetic Flowers & Vaporwave Clothing”)
- color: color of the item (e.g., “Black”)
- size: size of the item (e.g., “xxl”)
- model: model of the item (e.g., “iphone 13”)
- material: material of the item (e.g., “cotton”)
- author: author of the item (e.g., “J. K. Rowling”)
- desc: description about a item’s key features and benefits called out via bullet points (e.g., “Solid colors: 100% Cotton; Heather Grey: 90% Cotton, 10% Polyester; All Other Heathers …”)
The main objective of this competition is to build advanced session-based algorithms/models that directly predicts the next engaged product or generates its title text. The three tasks we proposed are:
- Next Product Recommendation
- Next Product Recommendation for Underrepresented Languages/Locales
- Next Product Title Generation
Note that the three tasks share the same training set. However, the objectives of three tasks are different. Details of each tasks are as follows:
Task 1 aims to predict the next product that a customer is likely to engage with, given their session data and the attributes of each product. The test set for Task 1 comprises data from English, German, and Japanese locales. Participants are required to create a program that can predict the next product for each session in the test set.
To submit their predictions, participants should provide a single parquet file in which each row corresponds to a session in the test set. For each session, the participant should predict 100 product IDs (ASINs) that are most likely to be engaged with, based on historical engagements in the session. The product IDs should be stored in a list and are listed in decreasing order of confidence, with the most confident prediction at index 0 and least confident prediction at index 99.
For example, if product_25 is the most confident prediction for a session, product_100 is the second most confident prediction, and product_199 is the least confident prediction for the same session, the participant's submission should list product_25 first, product_100 next, a lot of other predictions in the middle, and product_199 last.
|UK||[product_1, product_2, product_3]|
|[product_25, product_100,…, product_199]|
|[product_333, product_123,…, product_231]|
The evaluation metric for Task 1 is Mean Reciprocal Rank (MRR).
Mean Reciprocal Rank (MRR) is a metric used in information retrieval and recommendation systems to measure the effectiveness of a model in providing relevant results. MRR is computed with the following two steps: (1) calculate the reciprocal rank. The reciprocal rank is the inverse of the position at which the first relevant item appears in the list of recommendations. If no relevant item is found in the list, the reciprocal rank is considered 0. (2) average of the reciprocal ranks of the first relevant item for each session.
where Rank(t) is the rank of the ground truth on the top K result ranking list of test session t, and if there is no ground truth on the top K ranking list, then we would set 1Rank(t)=0. MRR values range from 0 to 1, with higher values indicating better performance. A perfect MRR score of 1 means that the model always places the first relevant item at the top of the recommendation list. An MRR score of 0 implies that no relevant items were found in the list of recommendations for any of the queries or users.
Leaderboard & Evaluations
Each task will have its separate leaderboard, which will be maintained throughout the competition for models evaluated on the public test set. At the end of the competition, a private leaderboard will be maintained for models evaluated on the private test set. This latter leaderboard will be used to decide the winners for each task in the competition. The leaderboard on the public test set is meant to guide participants on their model performance and compare it with other participants. [placeholder content]
- Start Date: 15th March, 2023
- End Date: 9th June, 2023 00.00 UTC
- Winner Announcement: 14th June, 2023
There are prizes for all three tasks. For each of the task, top three positions on leaderboard win the following cash prize.
- 🥇 First place : $4,000
- 🥈 Second place : $2,000
- 🥉 Third place : $1,000
🪙 AWS Credits
For each of the three tasks, the teams/participants that finish between the 4th and 10th position on the leaderboard will receive AWS credit worth $500.
🏛️ KDD Cup Workshop
KDD Cup is an annual data mining and knowledge discovery competition organised by the Association for Computing Machinery's Special Interest Group on Knowledge Discovery and Data Mining (ACM SIGKDD). The competition aims to promote research and development in data mining and knowledge discovery by providing a platform for researchers and practitioners to share their innovative solutions to challenging problems in various domains.
Have queries, feedback or looking for teammates, drop a message on (AIcrowd Community)[https://discourse.aicrowd.com/c/kdd-cup-2023/2696]. Please use firstname.lastname@example.org for all communication to reach the Amazon KDD cup 2023 team.
Organizers of this competition are (in alphabetical order):
- Bing Yin
- Chen Luo
- Haitao Mao
- Hanqing Lu
- Haoming Jiang
- Jiliang Tang
- Karthik Subbian
- Monica Cheng
- Suhang Wang
- Wei Jin
- Xianfeng Tang
- Yizhou Sun
- Zhen Li
- Zheng Li
- Zhengyang Wang
We thank our partners in AWS, Paxton Hall, for supporting with the AWS credits for winning teams and the competition.