Customer Segmentation using Centroid Based and Density Based Clustering Algorithms A. S. M. Shahadat Hossain Department of Computer Science & Engineering Rajshahi University of Engineering & Technology Rajshahi - 6204, Bangladesh shahadat.ruet.cse@gmail.com 2017 3rd International Conference on Electrical Information and Communication Technology (EICT), 7-9 December 2017, Khulna, Bangladesh 978-1-5386-2307-7/17/$31.00 c 2017 IEEE Abstract—In recent years, customer segmentation has become one of the most significant and useful tools for e-commerce. It plays a vital role in online product recommendation system and also helps to understand local and global wholesale or retail mar- ket. Customer segmentation refers to grouping customers into different categories based on shared characteristics such as age, location, spending habit and so on. Similarly, clustering means putting things together in such a way that similar type of things remain in the same group. Due to having similarities between these two terms, it is possible to apply clustering algorithms for ensuring satisfactory and automatic customer segmentation. Among different types of clustering algorithms, centroid based and density based are the most popular. This paper illustrates the idea of applying density based algorithms for customer segmentation beside using centroid based algorithms like k- means. Applying DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm as one of the density based algorithms results in a meaningful customer segmentation. Keywords—Customer segmentation, Market segmentation, Clustering algorithms, Centroid based clustering, Density based clustering. I. I NTRODUCTION The concept of ‘Customer Segmentation’ which is also alternatively known as ‘Market Segmentation’ was introduced by Smith in 1956. It was stated as “Market segmentation involves viewing a heterogeneous market as a number of smaller homogeneous markets in response to differing prefer- ences, attributable to the desires of consumers for more precise satisfaction of their varying wants” [1]. To ensure customer satisfaction and optimal profit, customer segmentation helps to study and understand behaviors of customers. Actually, it is done by analyzing different types of information about customers. Customer segmentation can be done focusing on different aspects such as demographic, geographic, behavioral and so on. Among them, this paper concentrates mostly on behavioral perspective as it is the most effective and practical one. Spending habit can be considered as one of the behavioral instances of customers which varies from one to another. Clustering means partitioning a set of data in a set of groups of similar data. Clustering algorithms refer to those machine learning algorithms which are associated with unlabeled data. Among all other types of clustering algorithms, centroid based and density based algorithms are the most popular two types. In ‘Centroid Based’ clustering, clustering is done based on some randomly initialized points and minimum distance from a point to others. On the other hand, in ‘Density Based’ clustering, points are clustered based on their densities in a particular region. In spite of the existence of a few works related to cluster based customer segmentation, no one of them has considered applying any density based clustering algorithm. Therefore, this paper implements DBSCAN algorithm as one of the den- sity based algorithms while applying k-means with different distance metrics as a centroid based clustering algorithm. The rest of the paper is organized as follows: section II discusses related work, section III gives a brief overview of the theoretical terms used in this paper and section IV presents the implementation. Besides, section V discusses on the experimental results. Finally, section VI concludes focusing on future work. II. RELATED WORK Namvar et al. [2] introduced a new customer segmentation method consisting of two phase clustering. They showed that combining demographic data with two phase clustering results in a relatively better clustering. Hruschka and Natter [3] compared performances of feedforward Neural Network and k-means algorithms for cluster based market segmentation and found that cluster analysis done by Neural Network is better than of k-means clustering. Besides, Wu and Lin [4] studied cluster based customer segmentation model, Lee and Park [5] surveyed customer satisfaction for satisfactory customer segmentation and Teichert et al. [6] studied a specific case of customer segmentation in airline industry while customer segmentation in case of banking selection was presented by Anderson et al. [7]. How data driven customer segmentation is done in case of tourism was shown by Dolnicar [8]. III. BACKGROUND STUDY The theoretical terms behind this work such as customer segmentation, clustering, centroid based algorithms, density based algorithms etc. are briefly discussed here: