Traffic-based Network Clustering Luigi Laura Dip. di Informatica e Sistemistica “Sapienza” Università di Roma Via Ariosto, 25 00198 Roma Italy. laura@dis.uniroma1.it Maurizio Naldi Dip. di Informatica, Sistemi e Produzione Univ. di Roma “Tor Vergata” via del Politecnico 1 00133 Roma Italy. naldi@disp.uniroma2.it Giuseppe F. Italiano Dip. di Informatica, Sistemi e Produzione Univ. di Roma “Tor Vergata” via del Politecnico 1 00133 Roma Italy. italiano@disp.uniroma2.it ABSTRACT Network clustering is traditionally approached just relying on the topology of the network, and neglecting the infor- mation on the traffic intensity between the nodes. In this paper we propose traffic-aware clustering, whereby networks are clustered on the basis of their traffic matrices. We re- define two clustering metrics for the context of traffic ma- trices, and perform an exploratory analysis by comparing four well known algorithms against two real-world datasets, each made of 1000 traffic matrices, respectively from Abilene and G´ eant networks. The Spectral Filtering algorithm ap- pears as the best performer. However, in the G´ eant network dataset the two metrics provide different rankings for the al- gorithms under examination, and Newman’s algorithm can perform marginally better under one of the two metrics. Categories and Subject Descriptors C.2 [Computer-Communication Networks]: Miscella- neous; D.2.8 [Software Engineering]: Metrics—complex- ity measures, performance measures General Terms Algorithms, Measurement, Theory Keywords traffic matrices, clustering, graph algorithms 1. INTRODUCTION Several algorithms, models, and indices for clustering prob- lems have appeared for different application domains, such as data mining, computer graphics and VLSI design; see, e.g., the overview in Jain et al. [8]. Even if we restrict our attention to the problem of graph clustering, i.e., grouping together similar nodes in a network, there is an ample litera- ture, surveyed, e.g., in Gaertler [5]. So far, network cluster- ing has been accomplished by relying on topology matrices, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IWCMC’10 June 28 - July 2, 2010, Caen, France. Copyright 2010 ACM 978-1-4503-0062-9/10/06 ...$5.00. that describe the physical connectivity between communi- cation nodes (see, e.g. the work of Gkantsidis, Mihail, and Zegura [6]). However, such approach does not take into ac- count the actual intensity of the relationship between any two nodes. Such information is conveyed by traffic matrices. They present several peculiarities that distinguish them from gen- eral graph matrices and from topology matrices: traffic ma- trices are very dense, usually complete, as opposed to the usual sparse structure of topology matrices of communica- tion networks; furthermore they are weighted, with weights varying considerably over even small timeframes, and are asymmetric, thus preventing us from using in a straightfor- ward way the traditional approaches designed for weighted networks (see, e.g., the work of Newman [11]). In this paper we propose a new approach to network clus- tering by advocating the use of traffic matrices. Such an approach could help in: • network planning, by augmenting intra-cluster links; • more accurate modeling of network flows, by defining intra-cluster and extra-cluster flows; • focusing on intra-cluster reliability rather than network- wide reliability. For the purpose of traffic-aware clustering we redefine two clustering metrics, namely the Traffic aware Scaled Coverage Measure (TS), derived from the well known Scaled Coverage Measure (SCM) [2], and the Modularity measure, originally defined for weighted graphs [11] (see Section 3). We perform an exploratory analysis by examining how four established clustering algorithms (see Section 4) perform under those metrics. We employ two real-world datasets (detailed in Section 5). The results of this experimental comparison are presented in Section 6. Though the algorithms considered in this paper were not designed to be used with traffic matrices, our redefinition of the two metrics allows us to evaluate the performance of any clustering algorithm from a traffic-based viewpoint, and to drive the future development of traffic-oriented cluster- ing approaches. Our exploratory analysis shows that on our two extensive datasets the Spectral Filtering algorithm out- performs the other algorithms under both metrics, though Newman’s algorithm can perform marginally better in some cases under the TS metric. 2. TRAFFIC MATRICES