A New Customer Segmentation Framework
Based on Biclustering Analysis
Xiaohui Hu
1
1 Laboratory of Quantum Engineering and Quantum Materials, School of Physics and Tele-communication
Engineering , South China Normal University, Guangzhou 510006, China
Email: xiaohui_huhu@sina.com
Haolan Zhang
2
, Xiaosheng Wu
1
, Jianlin Chen
1
,Yu Xiao
1
, Yun Xue
1
, Tiechen Li
1
, Hongya Zhao
3
2 NIT, ZheJiang University, Hanzhou, P.R.China
3 Industrial Center, Shenzhen Polytechnic, Shenzhen , Guangdong, China
Email:haolan.zhang@gmail.com, super-fly@foxmail.com
Abstract—The paper presents a novel approach for
customer segmentation which is the basic issue for an
effective CRM ( Customer Relationship Management ).
Firstly, the chi-square statistical analysis is applied to
choose set of attributes and K-means algorithm is employed
to quantize the value of each attribute. Then DBSCAN
algorithm based on density is introduced to classify the
customers into three groups (the first, the second and the
third class). Finally biclustering based on improved Apriori
algorithm is used in the three groups to obtain more detailed
information. Experimental results on the dataset of an
airline company show that the biclustering could segment
the customers more accurately and meticulously. Compared
with the traditional customer segmentation method, the
framework described is more efficient on the dataset.
Index Terms—Customer segmentation, biclustering, K-
means,Chi-square statistics, DBSCAN, Apriori
I. INTRODUCTION
In today’s highly competitive business environment,
customer relationship management (CRM) is a critical
success factor for the survival and growth of businesses,
which has been more widely used in some industries and
areas, including tourism, catering, retail trade, network
marketing, network services and other e-commerce, etc.
Customer segmentation is the basic issue for an effective
CRM due to its role in helping organizations to
understand and serve existing customers better, and
enabling the acquisition of profitable customers.
Nowadays, data mining technology plays a more
important role in the demands of analyzing and utilizing
the large scale information gathered from customers.
Many studies in the literature have researched the
application of data mining technology in customer
segmentation, and achieved sound effectives. Alex.
Berson used decision trees and clustering technology for
customer segmentation [1]. Guillem Lefait presented a
data mining architecture based on clustering techniques to
help experts to segment customer based on their purchase
behaviors[2]. Jaesoo Kim used neural networks in
tourism industry customer classification[3], Meng
Xiaolian, Yang Yu proposed a customer identifying
model based on customer value in commercial banks[4].
Literatures [6][7] select the K-means clustering
algorithm to recognize groups of customers who share the
same or similar needs [5]. The K-means clustering
algorithm has been widely used because of its simplicity
and its efficiency. Segmentation is done not only to
identify groups of entities that have common
characteristic but also to better understand consumer
behaviors. However, when we use clustering to segment
customers there exist some problems: which data to select,
how many clusters to produce and how to evaluate the
clustering results [20]. Thus, the customer segmentation
has two main challenges. The first challenge is to
formalize implicit data and the second challenge is to
select a relevant subset of the available features to
perform the clustering.
Hence there exist limitations in clustering algorithms
such as K-means or hierarchical clustering, etc. Firstly,
common cluster methods usually seek a disjoint cover of
the set of elements, requiring that no objects belong to
more than one cluster. In fact, customers can participate
in more than one activity and should therefore be
included in several clusters. What’s more, clustering
algorithms obtain clusters in either rows or columns. In
this case, the clusters produced reflect the global patterns
of data. Therefore clustering fails to detect local patterns
in the data. Biclustering, or subspace clustering, was
proposed to overcome above problems of traditional
clustering. Biclustering performs simultaneous clustering
on the rows and columns of the data matrix, which is able
to find local patterns in the form of subgroups of rows
and columns.
The paper proposes a new framework for customer
segmentation based on biclustering analysis, which could
not only formalize the implicit data but also acquired a
subset of the available attributes to provide more explicit
informations about customers. Firstly, the chi-square
statistical analysis is applied to choose set of attributes
and K-means algorithm is employed to quantize the value
of each attribute. Then DBSCAN algorithm based on
JOURNAL OF SOFTWARE, VOL. 9, NO. 6, JUNE 2014 1359
© 2014 ACADEMY PUBLISHER
doi:10.4304/jsw.9.6.1359-1366