ITMgen - A First-principles Approach to Generating Synthetic Interdomain Traffic Matrices Jakub Mikians UPC BarcelonaTech jmikians@ac.upc.edu Nikolaos Laoutaris Telefonica Research nikos@tid.es Amogh Dhamdhere CAIDA amogh@caida.org Pere Barlet-Ros UPC BarcelonaTech pbarlet@ac.upc.edu Abstract—We present the design and evaluation of ITMgen, a tool for generating synthetic but representative Interdomain Traffic Matrices (ITMs). ITMgen is motivated by the observation that gravity-based models do not reflect application level or regional characteristics of Internet traffic. ITMgen works at the level of connections, taking into account the relative sizes of ASes, their popularity with respect to various applications, and the relation between forward and reverse traffic for different application types. The necessary parameters for integrating application types and the distribution of content popularity can be realistically estimated by combining public sources like Alexa that capture traffic trends at a macro level with local traffic sampling (NetFlow, DPI) for providing an additional enhancement layer at the micro level. Using the above philosophy we demonstrate that we can synthesize ITMs that match real- world measurements closer than the current state of the art. In addition, the modular design philosophy of ITMgen makes it easy to integrate additional enhancement layers that improve the accuracy of our existing implementation. I. I NTRODUCTION The knowledge of interdomain traffic characteristics is important for a number of reasons, particularly related to economics and policy, as the flow of money on the Internet depends on the flow of traffic. A comprehensive understanding of interdomain traffic characteristics has consistently remained elusive, primarily due to the difficulty of obtaining representa- tive traffic data which is often viewed as sensitive information. However, we need realistic interdomain traffic matrices in order to model and simulate new interdomain interconnection policies, pricing schemes, or routing protocols. Moreover, simulations of the interdomain Internet often need to be at dif- ferent scales than the real Internet (which consists of more than 40,000 networks), either “shrinking” the actual traffic matrix for scalable modeling and simulation, or to investigate “what- if” scenarios in the evolution of the Internet. Researchers have mostly had to rely on synthetic interdomain traffic matrices generated using ad-hoc methods, reproducing some high-level characteristics of the interdomain traffic matrices such as heavy-tailed traffic volume distributions, or the presence of large traffic sources and sinks [10], [12], [14]. However, the research community lacks a configurable tool for producing synthetic traffic matrices of arbitrary size that match basic real interdomain traffic characteristics in more detail. To fill the gap, we present in this paper the design and evaluation of ITMgen, a new tool to generate representative synthetic interdomain traffic matrices. ITMgen is based on first-principles, and incorporates several features that result in more representative traffic matrices than the current state of the art [9]. First, we model interdomain traffic at the level of connections, taking into account the relative sizes of ASes measured by the number of users they serve. Second, we model multiple content (or application) types, and their effect on interdomain traffic in terms of the ratio of forward to reverse traffic that each application type produces. Third, ITMgen captures the fact that the popularity of content objects shows regional effects - certain websites, for instance, may be more popular in specific countries or geographical regions. Finally, ITMgen is designed to be parameterized with high- level input data that is available publicly, and we provide such a canonical parameterization that represent present-day interdomain traffic characteristics. ITMgen is designed to be highly configurable and extensible; when new content types emerge and data about them becomes available, ITMgen can be easily extended to incorporate the new data. We are making the ITMgen tool, and the data required to parameterize it available to the research community [1]. The remainder of this paper is organized as follows. Sec. II discusses related work. Sec. III describes the design of ITMgen. Sec. IV describes the datasets used. Sec. V demonstrates how ITMgen can be parametrized and how to synthesize a matrix. The validation is presented in Sec. VI. Sec. VII concludes the work. II. RELATED WORK Most prior work on traffic matrix estimation and generation focused on intradomain traffic (see [7], [10], [17], [18], [20], [21] and references therein). Although those solutions give useful hints about synthesizing interdomain traffic matrices, they cannot be applied directly to the interdomain context. A prior paper on modeling intradomain traffic that inspired our work was by Erramilli et al. [11], which modeled intradomain traffic at the level of individual connections. Several studies have measured interdomain traffic char- acteristics. An early study by Fang et al. [12], confirmed by [10], [17], showed that interdomain traffic distributions are highly non-uniform. Labovitz et al. [14] reported that interdomain traffic has been consolidating. Maier et al. [15] characterized residential broadband traffic. Bharti et al. [7] report on the sparseness of the ITM, and propose methods to infer the invisible elements of the ITM. Mikians et al. [16]