Trafﬁc-based Network Clustering Luigi Laura Dip. di Informatica e Sistemistica “Sapienza” Università di Roma Via Ariosto, 25 00198 Roma Italy. laura@dis.uniroma1.it Maurizio Naldi Dip. di Informatica, Sistemi e Produzione Univ. di Roma “Tor Vergata” via del Politecnico 1 00133 Roma Italy. naldi@disp.uniroma2.it Giuseppe F. Italiano Dip. di Informatica, Sistemi e Produzione Univ. di Roma “Tor Vergata” via del Politecnico 1 00133 Roma Italy. italiano@disp.uniroma2.it ABSTRACT Network clustering is traditionally approached just relying on the topology of the network, and neglecting the infor- mation on the traﬃc intensity between the nodes. In this paper we propose traﬃc-aware clustering, whereby networks are clustered on the basis of their traﬃc matrices. We re- deﬁne two clustering metrics for the context of traﬃc ma- trices, and perform an exploratory analysis by comparing four well known algorithms against two real-world datasets, each made of 1000 traﬃc matrices, respectively from Abilene and G´ eant networks. The Spectral Filtering algorithm ap- pears as the best performer. However, in the G´ eant network dataset the two metrics provide diﬀerent rankings for the al- gorithms under examination, and Newman’s algorithm can perform marginally better under one of the two metrics. Categories and Subject Descriptors C.2 [Computer-Communication Networks]: Miscella- neous; D.2.8 [Software Engineering]: Metrics—complex- ity measures, performance measures General Terms Algorithms, Measurement, Theory Keywords traﬃc matrices, clustering, graph algorithms 1. INTRODUCTION Several algorithms, models, and indices for clustering prob- lems have appeared for diﬀerent application domains, such as data mining, computer graphics and VLSI design; see, e.g., the overview in Jain et al. [8]. Even if we restrict our attention to the problem of graph clustering, i.e., grouping together similar nodes in a network, there is an ample litera- ture, surveyed, e.g., in Gaertler [5]. So far, network cluster- ing has been accomplished by relying on topology matrices, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. IWCMC’10 June 28 - July 2, 2010, Caen, France. Copyright 2010 ACM 978-1-4503-0062-9/10/06 ...$5.00. that describe the physical connectivity between communi- cation nodes (see, e.g. the work of Gkantsidis, Mihail, and Zegura [6]). However, such approach does not take into ac- count the actual intensity of the relationship between any two nodes. Such information is conveyed by traﬃc matrices. They present several peculiarities that distinguish them from gen- eral graph matrices and from topology matrices: traﬃc ma- trices are very dense, usually complete, as opposed to the usual sparse structure of topology matrices of communica- tion networks; furthermore they are weighted, with weights varying considerably over even small timeframes, and are asymmetric, thus preventing us from using in a straightfor- ward way the traditional approaches designed for weighted networks (see, e.g., the work of Newman [11]). In this paper we propose a new approach to network clus- tering by advocating the use of traﬃc matrices. Such an approach could help in: • network planning, by augmenting intra-cluster links; • more accurate modeling of network ﬂows, by deﬁning intra-cluster and extra-cluster ﬂows; • focusing on intra-cluster reliability rather than network- wide reliability. For the purpose of traﬃc-aware clustering we redeﬁne two clustering metrics, namely the Traﬃc aware Scaled Coverage Measure (TS), derived from the well known Scaled Coverage Measure (SCM) [2], and the Modularity measure, originally deﬁned for weighted graphs [11] (see Section 3). We perform an exploratory analysis by examining how four established clustering algorithms (see Section 4) perform under those metrics. We employ two real-world datasets (detailed in Section 5). The results of this experimental comparison are presented in Section 6. Though the algorithms considered in this paper were not designed to be used with traﬃc matrices, our redeﬁnition of the two metrics allows us to evaluate the performance of any clustering algorithm from a traﬃc-based viewpoint, and to drive the future development of traﬃc-oriented cluster- ing approaches. Our exploratory analysis shows that on our two extensive datasets the Spectral Filtering algorithm out- performs the other algorithms under both metrics, though Newman’s algorithm can perform marginally better in some cases under the TS metric. 2. TRAFFIC MATRICES