Research Article Distributed Data Clustering via Opinion Dynamics Gabriele Oliva, 1 Damiano La Manna, 2 Adriano Fagiolini, 2 and Roberto Setola 1 1 University Campus Bio-Medico of Rome, Via A. del Portillo 21, 00128 Rome, Italy 2 Dipartimento di Energia, Ingegneria dell’Informazione e Modelli Matematici (DEIM), University of Palermo, Viale delle Scienze, Edifcio 10, 90128 Palermo, Italy Correspondence should be addressed to Gabriele Oliva; g.oliva@unicampus.it Received 27 November 2014; Accepted 5 February 2015 Academic Editor: Jianshe Wu Copyright © 2015 Gabriele Oliva et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We provide a distributed method to partition a large set of data in clusters, characterized by small in-group and large out-group distances. We assume a wireless sensors network in which each sensor is given a large set of data and the objective is to provide a way to group the sensors in homogeneous clusters by information type. In previous literature, the desired number of clusters must be specifed a priori by the user. In our approach, the clusters are constrained to have centroids with a distance at least between them and the number of desired clusters is not specifed. Although traditional algorithms fail to solve the problem with this constraint, it can help obtain a better clustering. In this paper, a solution based on the Hegselmann-Krause opinion dynamics model is proposed to fnd an admissible, although suboptimal, solution. Te Hegselmann-Krause model is a centralized algorithm; here we provide a distributed implementation, based on a combination of distributed consensus algorithms. A comparison with -means algorithm concludes the paper. 1. Introduction Te problem of grouping large amounts of data into a small number of subsets with some common features among the elements (ofen referred to as the data clustering problem) has attracted the work of several researchers in diferent felds, ranging from statistics to imagine analysis and bioinformatics [13]. Data clustering techniques are developed to partition an initial set of observation data into collections with small in- group distances and big out-group distances. Among the existing techniques, one of the most used is the -means algorithm or its successive extensions (e.g., fuzzy -means [4], mixture of Gaussians algorithms [5]). Given a set of initial observation data and a number of desired clusters, the -means algorithm computes a suboptimal placement of cluster centroids and assigns the observations to such centroids, alternating between an assignment phase, where each observation point is assigned to its nearest centroid, and refnement phase, where each centroid position is updated as the center of mass of all observations belonging to that centroid. A well-known limitation of data clustering algorithms, such as the -means algorithm, is that the number of clusters has to be specifed beforehand, based for example, on subjective evaluations or a priori analysis. Since this assumption is typically not feasible in practice, a typical solution consists of running several times the algorithm with a diferent number of clusters and then deciding the best obtained solution based on a posteriori evaluation [6]. Another issue of traditional algorithms is that there is no guarantee that the clusters are sufciently far from each other. To this respect, distance-constrained data clustering approaches have been devised in the literature: in [7, 8] the considered constraints are the so-called must-links (i.e., an observation must belong to a cluster ) and cannot-link (i.e., an observation can not belong to a cluster ); in [9] the feasibility of a constrained problem involving the so-called -constraints (i.e., any two observations must have a distance greater than ) and the -constraints (i.e., for any observation in cluster there must be at least another observation in cluster such that the distance between and is less than ) is given. To the best of our knowledge, nowadays, there is no methodology to specify a constraint on the distance between Hindawi Publishing Corporation International Journal of Distributed Sensor Networks Volume 2015, Article ID 753102, 13 pages http://dx.doi.org/10.1155/2015/753102