J.F. Martínez-Trinidad et al. (Eds.): CIARP 2006, LNCS 4225, pp. 491 – 501, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Conceptual K-Means Algorithm Based on Complex
Features
I.O. Ayaquica-Martínez, J.Fco. Martínez-Trinidad, and J. Ariel Carrasco-Ochoa
National Institute of Astrophysics, Optics and Electronics
Computer Science Department
Luis Enrique Erro # 1, Santa María Tonantzintla, Puebla, Mexico, C.P. 72840
{ayaquica, fmartine, ariel}@inaoep.mx
Abstract. The k-means algorithm is the most studied and used tool for solving
the clustering problem when the number of clusters is known a priori.
Nowadays, there is only one conceptual version of this algorithm, the
conceptual k-means algorithm. One characteristic of this algorithm is the use of
generalization lattices, which define relationships among the feature values.
However, for many applications, it is difficult to determine the best
generalization lattices; moreover there are not automatic methods to build the
lattices, thus this task must be done by the specialist of the area in which we
want to solve the problem. In addition, this algorithm does not work with
missing data. For these reasons, in this paper, a new conceptual k-means
algorithm that does not use generalization lattices to build the concepts and
allows working with missing data is proposed. We use complex features for
generating the concepts. The complex features are subsets of features with
associated values that characterize objects of a cluster and at the same time not
characterize objects from other clusters. Some experimental results obtained by
our algorithm are shown and they are compared against the results obtained by
the conceptual k-means algorithm.
1 Introduction
The conceptual clustering problem was first addressed in the 80’s by Michalski [1]. It
consists on finding, from a data set, not only the clusters but also a conceptual
interpretation of them. Starting from the Michalski’s works several algorithms have
been developed to solve the conceptual clustering problem. Some of them can be
found in [2-8].
The k-means algorithm is the most studied and used tool for solving the clustering
problem when the number of clusters is known a priori. The conceptual k-means
algorithm proposed by Ralambondrainy [8] is the unique conceptual version of this
algorithm. Then, we are going to focus in this algorithm.
The conceptual k-means algorithm [8] was developed to solve problems where the
number of clusters is known a priori. This algorithm consists of two phases: an
aggregation phase, in which the clusters are built and a characterization phase, in
which the concepts are generated. In the aggregation phase, the k-means algorithm
was extended to work with mixed data. In order to solve the mixed data problem, a