Representation Learning for Predicting Customer Orders Tongwen Wu 1 , Yu Yang 1 , Yanzhi Li 1 , Huiqiang Mao 3∗ , Liming Li 2 , Xiaoqing Wang 2 , Yuming Deng 2 1 City University of Hong Kong, Hong Kong, China 2 Alibaba Group, Hanzhou, China 3 Tencent, Shenzhen, China tongwenwu2-c@my.cityu.edu.hk,{yuyang,yanzhili}@cityu.edu.hk,huiqiangmao@gmail.com, {liming.l,robin.wxq,yuming.dym}@alibaba-inc.com ABSTRACT The ability to predict future customer orders is of signifcant value to retailers in making many crucial operational decisions. Diferent from next basket prediction or temporal set prediction, which fo- cuses on predicting a subset of items for a single user, this paper aims for the distributional information of future orders, i.e., the possible subsets of items and their frequencies (probabilities), which is required for decisions such as assortment selection for front-end warehouses and capacity evaluation for fulfllment centers. Based on key statistics of a real order dataset from Tmall supermarket, we show the challenges of order prediction. Motivated by our analysis that biased models of order distribution can still help improve the quality of order prediction, we design a generative model to capture the order distribution for customer order prediction. Our model utilizes representation learning to embed items into a Euclidean space and design a highly efcient SGD algorithm to learn the item embeddings. Future order prediction is done by calibrating orders obtained by random walks over the embedding graph. The experi- ments show that our model outperforms all the existing methods. The beneft of our model is also illustrated with an application to assortment selection for front-end warehouses. CCS CONCEPTS · Applied computing → Electronic commerce. KEYWORDS Choice Model; Representation Learning; Random Walk; E-commerce ACM Reference Format: Tongwen Wu 1 , Yu Yang 1 , Yanzhi Li 1 , Huiqiang Mao 3∗ , Liming Li 2 , Xi- aoqing Wang 2 , Yuming Deng 2 . 2021. Representation Learning for Pre- dicting Customer Orders . In Proceedings of the 27th ACM SIGKDD Con- ference on Knowledge Discovery and Data Mining (KDD ’21), August 14ś 18, 2021, Virtual Event, Singapore. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3447548.3467170 ∗This work was done while Huiqaing Mao was at Alibaba Group. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. KDD ’21, August 14ś18, 2021, Virtual Event, Singapore © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8332-5/21/08. . . $15.00 https://doi.org/10.1145/3447548.3467170 1 INTRODUCTION In mining e-commerce orders, one of the most important problems is to learn the distribution of all possible order types from data, where an order type is a specifc combination/set of items. Many business applications require such distributional information. For example, e-commerce frms often conduct simulation studies in preparing or evaluating the handling capacity of fulfllment cen- ters [12]. A critical input to the simulation study is the order compo- sition during a period of time, say, the next day or the next couple of days. The current practice is that frms directly sample the his- torical order data as the input, which is indicative of the future but may not be sufciently representative because, simply put, many future orders have never appeared before. Moreover, to guarantee speedy delivery, e-commerce frms often set up front-end ware- houses in close proximity to customers. Optimizing assortments for such warehouses (i.e., the products carried by the warehouse) to maximize the number of orders that can be directly satisfed (i.e., to avoid order splits, which will mean higher cost and lower service level or even losing orders) also requires distributional in- formation of orders [24]. Note here that knowing the demand for individual items is insufcient, since an order cannot be directly satisfed from the front-end warehouse unless all of its items are available. Other examples include designing product bundle promo- tions, cross-selling, and pattern mining [18]. For all such business applications, the distributional information of orders is required. It is worth noticing that learning distributional information of orders is quite diferent from next-basket prediction [9, 23, 25], temporal set prediction [3, 16, 21, 26], or frequent set mining [7, 14]. We aim to characterize the full picture of the aggregated behavior of the market over a specifc time period. In contrast, next-basket prediction and temporal set prediction focus on the behavior of a specifc customer for the next shopping, irrespective of the time of purchase. In applying to our problem, these methods would perform poorly since they are originally not designed for such a purpose; likewise, our method does not apply to their problems either. Frequent set mining gives only the set of high frequency but no exact probabilistic information of mined sets, and it does not address the need of our business applications. Due to the combinatorial explosion of possible order types, learn- ing the order distribution from data faces a number of major chal- lenges. First, the order data for learning the distribution is usually sparse where the number of observed orders is much smaller than the number of possible order types. Many possible order types do not appear in the data set or just appear once. Thus, directly count- ing the order dataset to estimate each order type’s probability does ADS Track Paper KDD ’21, August 14–18, 2021, Virtual Event, Singapore 3735