Machine Learning Methods for Revenue Prediction in Google Merchandise Store Vahid Azizi and Guiping Hu Abstract Machine learning has gained increasing interests from various application domains for its ability to understand data and make predictions. In this paper, we apply machine learning techniques to predict revenue per customer for Google Merchandise Store. Exploratory Data Analysis (EDA) was conducted for the customer dataset and feature engineering was applied to the find best subset of features. Four machine learning methods, Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient Boosting Machine (LightGBM) have been applied to predict revenue per customer. Results show that LightGBM outperforms other methods in terms of RMSE and running time. Keywords Feature engineering · GBM · XGBoost · CatBoost · LightGBM 1 Introduction Google Merchandise Store (also known as GStore) is an online store which sells clothing, bags, drinkware, office supplies, and other accessories. Same as regular retailers, marketing teams are constantly challenged to design promotional strate- gies that are customized for individual customers, and online stores have advantages since data are often tracked at individual customer level. GStore is interested in ana- lyzing customer dataset to predict revenue per customer since it has been well known that the 80/20 rule applies in retail business. The 80/20 rule refers to the phenomenon that 20% of the customers generate 80% of the revenue. Making accurate customer consumption predictions is significant for operational strategy and marketing invest- ment. This paper aims to predict revenue per customer for GStore. Four decision tree- based machine learning algorithms, Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient V. Azizi · G. Hu (B ) Industrial and Manufacturing Systems Engineering Department, Iowa State University, Ames, IA 50010, USA e-mail: gphu@iastate.edu © Springer Nature Switzerland AG 2020 H. Yang et al. (eds.), Smart Service Systems, Operations Management, and Analytics, Springer Proceedings in Business and Economics, https://doi.org/10.1007/978-3-030-30967-1_7 65