Machine Learning Methods for Revenue
Prediction in Google Merchandise Store
Vahid Azizi and Guiping Hu
Abstract Machine learning has gained increasing interests from various application
domains for its ability to understand data and make predictions. In this paper, we apply
machine learning techniques to predict revenue per customer for Google Merchandise
Store. Exploratory Data Analysis (EDA) was conducted for the customer dataset and
feature engineering was applied to the find best subset of features. Four machine
learning methods, Gradient Boosting Machine (GBM), Extreme Gradient Boosting
(XGBoost), Categorical Boosting (CatBoost), and Light Gradient Boosting Machine
(LightGBM) have been applied to predict revenue per customer. Results show that
LightGBM outperforms other methods in terms of RMSE and running time.
Keywords Feature engineering · GBM · XGBoost · CatBoost · LightGBM
1 Introduction
Google Merchandise Store (also known as GStore) is an online store which sells
clothing, bags, drinkware, office supplies, and other accessories. Same as regular
retailers, marketing teams are constantly challenged to design promotional strate-
gies that are customized for individual customers, and online stores have advantages
since data are often tracked at individual customer level. GStore is interested in ana-
lyzing customer dataset to predict revenue per customer since it has been well known
that the 80/20 rule applies in retail business. The 80/20 rule refers to the phenomenon
that 20% of the customers generate 80% of the revenue. Making accurate customer
consumption predictions is significant for operational strategy and marketing invest-
ment.
This paper aims to predict revenue per customer for GStore. Four decision tree-
based machine learning algorithms, Gradient Boosting Machine (GBM), Extreme
Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient
V. Azizi · G. Hu (B )
Industrial and Manufacturing Systems Engineering Department, Iowa State University, Ames,
IA 50010, USA
e-mail: gphu@iastate.edu
© Springer Nature Switzerland AG 2020
H. Yang et al. (eds.), Smart Service Systems, Operations Management,
and Analytics, Springer Proceedings in Business and Economics,
https://doi.org/10.1007/978-3-030-30967-1_7
65