A New Customer Segmentation Framework Based on Biclustering Analysis Xiaohui Hu 1 1 Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Tele-communication Engineering , South China Normal University, Guangzhou 510006, China Email: xiaohui_huhu@sina.com Haolan Zhang 2 , Xiaosheng Wu 1 , Jianlin Chen 1 ,Yu Xiao 1 , Yun Xue 1 , Tiechen Li 1 , Hongya Zhao 3 2 NIT, ZheJiang University, Hanzhou, P.R.China 3 Industrial Center, Shenzhen Polytechnic, Shenzhen , Guangdong, China Email:haolan.zhang@gmail.com, super-fly@foxmail.com Abstract—The paper presents a novel approach for customer segmentation which is the basic issue for an effective CRM ( Customer Relationship Management ). Firstly, the chi-square statistical analysis is applied to choose set of attributes and K-means algorithm is employed to quantize the value of each attribute. Then DBSCAN algorithm based on density is introduced to classify the customers into three groups (the first, the second and the third class). Finally biclustering based on improved Apriori algorithm is used in the three groups to obtain more detailed information. Experimental results on the dataset of an airline company show that the biclustering could segment the customers more accurately and meticulously. Compared with the traditional customer segmentation method, the framework described is more efficient on the dataset. Index Terms—Customer segmentation, biclustering, K- means,Chi-square statistics, DBSCAN, Apriori I. INTRODUCTION In today’s highly competitive business environment, customer relationship management (CRM) is a critical success factor for the survival and growth of businesses, which has been more widely used in some industries and areas, including tourism, catering, retail trade, network marketing, network services and other e-commerce, etc. Customer segmentation is the basic issue for an effective CRM due to its role in helping organizations to understand and serve existing customers better, and enabling the acquisition of profitable customers. Nowadays, data mining technology plays a more important role in the demands of analyzing and utilizing the large scale information gathered from customers. Many studies in the literature have researched the application of data mining technology in customer segmentation, and achieved sound effectives. Alex. Berson used decision trees and clustering technology for customer segmentation [1]. Guillem Lefait presented a data mining architecture based on clustering techniques to help experts to segment customer based on their purchase behaviors[2]. Jaesoo Kim used neural networks in tourism industry customer classification[3], Meng Xiaolian, Yang Yu proposed a customer identifying model based on customer value in commercial banks[4]. Literatures [6][7] select the K-means clustering algorithm to recognize groups of customers who share the same or similar needs [5]. The K-means clustering algorithm has been widely used because of its simplicity and its efficiency. Segmentation is done not only to identify groups of entities that have common characteristic but also to better understand consumer behaviors. However, when we use clustering to segment customers there exist some problems: which data to select, how many clusters to produce and how to evaluate the clustering results [20]. Thus, the customer segmentation has two main challenges. The first challenge is to formalize implicit data and the second challenge is to select a relevant subset of the available features to perform the clustering. Hence there exist limitations in clustering algorithms such as K-means or hierarchical clustering, etc. Firstly, common cluster methods usually seek a disjoint cover of the set of elements, requiring that no objects belong to more than one cluster. In fact, customers can participate in more than one activity and should therefore be included in several clusters. What’s more, clustering algorithms obtain clusters in either rows or columns. In this case, the clusters produced reflect the global patterns of data. Therefore clustering fails to detect local patterns in the data. Biclustering, or subspace clustering, was proposed to overcome above problems of traditional clustering. Biclustering performs simultaneous clustering on the rows and columns of the data matrix, which is able to find local patterns in the form of subgroups of rows and columns. The paper proposes a new framework for customer segmentation based on biclustering analysis, which could not only formalize the implicit data but also acquired a subset of the available attributes to provide more explicit informations about customers. Firstly, the chi-square statistical analysis is applied to choose set of attributes and K-means algorithm is employed to quantize the value of each attribute. Then DBSCAN algorithm based on JOURNAL OF SOFTWARE, VOL. 9, NO. 6, JUNE 2014 1359 © 2014 ACADEMY PUBLISHER doi:10.4304/jsw.9.6.1359-1366