netEmbed: A Service for Embedding Distributed Applications † [Extended Abstract] Jorge Londoño ‡ CS Department Boston University Boston, MA jmlon@cs.bu.edu Azer Bestavros CS Department Boston University Boston, MA best@cs.bu.edu 1. INTRODUCTION An increased number of applications, such as computational grids, testbeds, peer-to-peer networks, and sensor networks (among many others) rely on finding a set of resources that meet certain criteria for their operation. In particular, in many of these cases their requirements may be described as a labeled graph where nodes represent computational re- sources and links represent connectivity/communication re- quirements. Similarly, the infrastructure where the service will be deployed is also described by a labeled graph, where the attributes of nodes and links represent their capabilities. The problem of finding a feasible set of links and nodes on which to deploy the service is what we call the embedding problem. As an illustrative example, consider the problem of mapping a sensor network application where nodes represent either sensing or computation operations and the application needs to find a set of resources subject to some constraints, e.g. the sensed variable, the location the sensor; the computa- tion nodes need to be within some delay from the sensors to process real time data, and must have at least certain band- width to meet the sensor data transfer rate. The queries need to be processed at the nodes imposing a constraint in their computational power. Finally, the results have to be delivered to the clients, which gives some additional com- munication requirements. This simple scenario illustrates the sort of constraints and requirements that an embedding service must provide. † This work is supported in part by a number of NSF awards, including CISE/CSR Award #0720604, ENG/EFRI Award #0735974, CISE/CNS Award #0524477, CNS/NeTS Award #0520166, CNS/ITR Award #0205294, and CISE/EIA RI Award #0202067. ‡ Supported in part by the Universidad Pontificia Bolivariana and COLCIENCIAS–Instituto Colombiano para el Desar- rollo de la Ciencia y la Tecnolog´ıa “Francisco Jos´e de Cal- das”. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MC’07, November 26-30, 2007, Newport Beach, CA Copyright 2007 ACM ISBN 978-1-59593-935-7/07/11...$5.00 The problem of finding the set of resources that match the requirements is clearly a combinatorial search/optimization problem. In particular, we distinguish the search as finding feasible embeddings and the optimization as finding the best embedding with respect to an optimality metric. Typical so- lutions in the bibliography have considered various heuristics to solve this problem. That is the case of Emulab/NetBed [1, 8] where both, simulated annealing and genetic algorithms have been used. Also, SWORD [6] considers some prun- ing heuristics, tailored for the particular case of PlanetLab, in order to significantly reduce the search space. Both ap- proaches sacrifice soundness, though. They may return false negatives: a no-solution answer when in fact there is a solu- tion. On the other hand, work in other areas has proposed the us- age of constraint satisfaction techniques. For example, gang- matching for Condor [7], and Redline [4], use constraint sat- isfaction techniques to match the jobs’ requirements with the computational capabilities of the nodes, the available soft- ware at the nodes, etc. However, none of these approaches take topology into consideration. This issue becomes par- ticularly important as distributed applications are being de- ployed in WAN environments, where link capacities and de- lays have a significant influence in the overall performance of the application. In the context of overlay networks, the work in [3] showed how constraint satisfaction techniques could be efficiently applied to the solution of embedding problems. Our work follows the same lines by providing techniques that improve the performance, while preserving the soundness (i.e. only return true positives) by never pruning feasible parts of the search space. 2. NETEMBED FRAMEWORK Our framework is designed to take three inputs: 1) a descrip- tion of the infrastructure where the overlays are going to be deployed. This description is given in the form of a labeled graph, where labels describe capabilities of links and nodes; 2) a description of the overlay, also as a labeled graph, but this time the labels represent the overlay’s requirements; and 3) a constraint expression that establishes the conditions on how to match the overlay’s requirements and with the infras- tructure’s capabilities. We use the GraphML [2] standard to represent both the infrastructure and the overlay, and