Network Flow Based Resource Brokering and Optimization Techniques for Distributed Data Streaming Over Optical Networks Cornelius Toole, Jr. Louisiana State University Center for Computation & Technology 216 Johnston Hall 70803 Baton Rouge, LA corntoole@cct.lsu.edu Andrei Hutanu Louisiana State University Center for Computation & Technology 216 Johnston Hall 70803 Baton Rouge, LA ahutanu@cct.lsu.edu ABSTRACT This article analyzes the problem of optimizing access to and transport of large remote data through intelligent resource selection and configuration. With the availability of high speed optical networks, the main problem for remote data access is shifting from having enough network bandwidth to having enough data ready to saturate the network when requested by the application. Network bandwidth is now higher than disk bandwidth and this gives us the possibility of utilizing multiple distributed resources to saturate the network links. We are considering two types of scenarios, one where we use only disks as data sources and a more advanced scenario where compute resources in the network can be utilized as caches to increase instantaneous throughput. The problem we are facing is choosing and configuring the resources for this scenario. This is a non-trivial problem however as we are using application-driven network resource allocation (which gives us predictability and determinism in terms of network performance) the problem becomes tractable. We discuss optimization algorithms that are ap- plicable to this problem, and present an algorithm that di- vides the problem in two sub-problems that can be solved using existing network flow algorithms. Categories and Subject Descriptors C.2.4 [COMPUTER-COMMUNICATION NETWORKS]: Distributed Systems—Distributed applica- tions ; B.4.3 [INPUT/OUTPUT AND DATA COM- MUNICATIONS]: Interconnections (Subsystems)—Par- allel I/O ; H.4 [Information Systems Applications]: Mis- cellaneous General Terms remote data access, optical grids, maximum network flow, resource brokering 1. INTRODUCTION High end computing, storage resources and high speed net- works are being deployed across the world. We are exploring ways in which these resources can be combined to support data intensive applications such as (distributed) visualiza- tion of large data. Network bandwidths have increased to the point in which they are no longer the bottleneck in many high performance computational workflows. For instance, in many data inten- sive applications data can be retrieved over high speed net- work links from remote machines at higher data rates than from local disks. Parallel and high performance storage sys- tems can pump data at high rates for computational and analysis tasks, however these resources are rare and mostly not available locally. Network links can enable access to re- mote storage systems as large network bandwidths reduce the difference between a local and a remote storage system. An alternative to this is to create distributed data storage systems on the fly, based on application requirements. The Enlightened project 1 has produced HARC [11], a software that enables co-allocation of dynamic network circuits and compute resources based on application needs. This, com- bined with a parallel data service enables the creation of such storage systems, as shown during iGrid 2005 and Su- percomputing 2006 [7]. In cooperation with the G-Lambda project in Japan it has been demonstrated (GLIF, Supercomputing 2006) that op- tical grids (defined as the combination of dynamically allo- cated network circuits, compute and storage resources) such as Enlightened can also be interoperable, thus extending the range of resources that can be used by a single distributed application. Now, by having the ability to pull together these resources and form ad-hoc metacomputing systems, the problem be- comes selecting and configuring these resources to get op- timal performance. We are exploring possible answers to this question for a particular use case scenario. In our use case we can select between multiple resources and we seek 1 http://www.enlightenedcomputing.org/