Research Article
Distributed Data Clustering via Opinion Dynamics
Gabriele Oliva,
1
Damiano La Manna,
2
Adriano Fagiolini,
2
and Roberto Setola
1
1
University Campus Bio-Medico of Rome, Via A. del Portillo 21, 00128 Rome, Italy
2
Dipartimento di Energia, Ingegneria dell’Informazione e Modelli Matematici (DEIM), University of Palermo,
Viale delle Scienze, Edifcio 10, 90128 Palermo, Italy
Correspondence should be addressed to Gabriele Oliva; g.oliva@unicampus.it
Received 27 November 2014; Accepted 5 February 2015
Academic Editor: Jianshe Wu
Copyright © 2015 Gabriele Oliva et al. Tis is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
We provide a distributed method to partition a large set of data in clusters, characterized by small in-group and large out-group
distances. We assume a wireless sensors network in which each sensor is given a large set of data and the objective is to provide a way
to group the sensors in homogeneous clusters by information type. In previous literature, the desired number of clusters must be
specifed a priori by the user. In our approach, the clusters are constrained to have centroids with a distance at least between them
and the number of desired clusters is not specifed. Although traditional algorithms fail to solve the problem with this constraint, it
can help obtain a better clustering. In this paper, a solution based on the Hegselmann-Krause opinion dynamics model is proposed
to fnd an admissible, although suboptimal, solution. Te Hegselmann-Krause model is a centralized algorithm; here we provide a
distributed implementation, based on a combination of distributed consensus algorithms. A comparison with -means algorithm
concludes the paper.
1. Introduction
Te problem of grouping large amounts of data into a small
number of subsets with some common features among the
elements (ofen referred to as the data clustering problem) has
attracted the work of several researchers in diferent felds,
ranging from statistics to imagine analysis and bioinformatics
[1–3].
Data clustering techniques are developed to partition an
initial set of observation data into collections with small in-
group distances and big out-group distances.
Among the existing techniques, one of the most used is
the -means algorithm or its successive extensions (e.g., fuzzy
-means [4], mixture of Gaussians algorithms [5]). Given a set
of initial observation data and a number of desired clusters,
the -means algorithm computes a suboptimal placement
of cluster centroids and assigns the observations to such
centroids, alternating between an assignment phase, where
each observation point is assigned to its nearest centroid, and
refnement phase, where each centroid position is updated
as the center of mass of all observations belonging to that
centroid.
A well-known limitation of data clustering algorithms,
such as the -means algorithm, is that the number of
clusters has to be specifed beforehand, based for example,
on subjective evaluations or a priori analysis. Since this
assumption is typically not feasible in practice, a typical
solution consists of running several times the algorithm
with a diferent number of clusters and then deciding the
best obtained solution based on a posteriori evaluation [6].
Another issue of traditional algorithms is that there is no
guarantee that the clusters are sufciently far from each
other. To this respect, distance-constrained data clustering
approaches have been devised in the literature: in [7, 8] the
considered constraints are the so-called must-links (i.e., an
observation must belong to a cluster ) and cannot-link (i.e.,
an observation can not belong to a cluster ); in [9] the
feasibility of a constrained problem involving the so-called
-constraints (i.e., any two observations must have a distance
greater than ) and the -constraints (i.e., for any observation
in cluster there must be at least another observation ℎ in
cluster such that the distance between and is less than )
is given. To the best of our knowledge, nowadays, there is no
methodology to specify a constraint on the distance between
Hindawi Publishing Corporation
International Journal of Distributed Sensor Networks
Volume 2015, Article ID 753102, 13 pages
http://dx.doi.org/10.1155/2015/753102