Sampling of Networks with Traceroute-Like Probes Alain Barrat a Ignacio Alvarez-Hamelin a Luca Dall’Asta a Alexei Vázquez b Alessandro Vespignani a, c a Laboratoire de Physique Théorique (UMR 8627 du CNRS), Université de Paris-Sud, Orsay, France; b Nieuwland Science Hall, University of Notre Dame, Notre Dame, Ind., and c School of Informatics and Department of Physics, Indiana University, Bloomington, Ind., USA A. Barrat, LPT Bâtiment 210 Université de Paris-Sud, FR–91405 Orsay Cedex (France) Tel. +33 1 69 15 82 22, Fax +33 1 69 15 82 87, E-Mail Alain.Barrat@th.u-psud.fr © 2006 S. Karger AG, Basel 1424–8492/06/0033–0083 $23.50/0 Complexus 2006;3:83–96 INFORMATION TECHNOLOGY MODELLING Key Words Network sampling Traceroute Internet exploration Topology inference Abstract A large part of the recent development of the interest in complex networks has been triggered by the observation of particular characteristics of real world net- works, such as the small-world properties or the heavy-tailed distributions of degrees. Many datasets are, however, the result of an incomplete sampling of the underlying real networks, and it has been argued that sampling procedures might introduce uncontrolled biases in the statistical properties of the sampled graph. In this paper, we explore this issue in the case of the Internet, which is generally mapped from a limited set of sources by using traceroute-like probes. The origin of the biases introduced by such a sampling process is investigated Published online: August 25, 2006 DOI: 10.1159/000094191 Fax +41 61 306 12 34 E-Mail karger@karger.ch www.karger.com Accessible online at: www.karger.com/cpu Simplexus Despite the pervasive influence of to- day’s Internet, for business and science, government, and just about every other area of human activity, it is not at all easy to get a good snapshot of what the network looks like; that is, of the pattern by which its computers are ‘wired’ together by tele- phone lines, satellite links and so on. The Internet has grown up organically, and without any central authority making it conform to some rational design. Hence, its structure can only be determined through careful measurement. The usual technique is a kind of Internet X-ray, carried out by sending information packets from a hand- ful of Internet sources, targeted towards many destinations. Along the way, such packets record the pathways defined by the links on which they travel, and so return narrow ‘slices’ of the overall network. By putting many such slices together, re- searchers have assembled crude maps of the Internet, and discovered what seem to be a number of interesting topological fea- tures. For example, the Internet appears to be a ‘small world’ network in that any pair of node computers (or ‘routers’) can be linked by short paths having only a handful of links. The Internet also appears to have a highly ‘heterogeneous’ character, in that a few ‘hub’ nodes have hundreds or thou- sands of links, while most have only one or a few. The latter feature appears in the par- ticular mathematical form of the distribu- tion of nodes according to the number of links they have (known as the ‘degree dis- tribution’). For the Internet, as for many other complex networks in physics or biol- ogy, this seems to follow a power law form, P (k ) k , where is between 2.0 and 2.5. The small world and heterogeneous char- acter of the Internet appear to be key to its capacity for stable and resilient operation even in the presence of frequent computer failures, and understanding its topology in detail is crucial for the intelligent design of