Chapter LX
Data Clustering
Yanchang Zhao
University of Technology, Sydney, Australia
Longbing Cao
University of Technology, Sydney, Australia
Huaifeng Zhang
University of Technology, Sydney, Australia
Chengqi Zhang
University of Technology, Sydney, Australia
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
IntroductIon
Clustering is one of the most important techniques
in data mining. This chapter presents a survey of
popular approaches for data clustering, includ-
ing well-known clustering techniques, such as
partitioning clustering, hierarchical clustering,
density-based clustering and grid-based cluster-
ing, and recent advances in clustering, such as
subspace clustering, text clustering and data
stream clustering. The major challenges and future
trends of data clustering will also be introduced
in this chapter.
The remainder of this chapter is organized as
follows. The background of data clustering will be
introduced in Section 2, including the defnition
of clustering, categories of clustering techniques,
features of good clustering algorithms, and the
validation of clustering. Section 3 will present
main approaches for clustering, which range from
the classic partitioning and hierarchical clustering
to recent approaches of bi-clustering and semi-
supervised clustering. Challenges and future
trends will be discussed in Section 4, followed
by the conclusions in the last section.
background
Data clustering is sourced from pattern recog-
nition (Theodoridis & Koutroumbas, 2006),
machine learning (Alpaydin, 2004), statistics
(Hill & Lewicki, 2007) and database technology
(Date, 2003). Data clustering is to partition data
into groups, where the data in the same group are
similar to one another and the data from different
groups are dissimilar (Han & Kamber, 2000).
More specifcally, it is to segment data into clusters