Approximation Modeling for the Online Performance Management of Distributed Computing Systems Dara Kusic , Nagarajan Kandasamy and Guofei Jiang Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104 Robust and Secure System Group, NEC Laboratories America, Princeton, NJ 08540 kusic@drexel.edu, kandasamy@ece.drexel.edu, gfj@nec-labs.com Abstract— This paper develops a hierarchical control framework to solve performance management problems in distributed computing systems. To reduce the control overhead, concepts from approximation theory are used in the construction of the dynamical models that predict system behavior, and in the solution of the associated control equations themselves. Using a dynamic resource provisioning problem as a case study, we show that a computing system managed by the proposed control framework using approximation models realizes profit gains that are, in the best case, within 1% of a controller using an exact parametric model of the system. I. I NTRODUCTION This short paper describes an optimization framework to solve a class of performance management problems in dis- tributed computing systems. We refer the interested reader to [1] for more details. The performance optimization problem is decomposed into a set of simpler sub-problems and solved in cooperative fashion by multiple controllers arranged in a decentralized hierarchical structure. Concepts from approxi- mation theory are applied in two places—in the construction of the dynamical models to track and predict system behavior over a finite prediction horizon, and in the solution of the associated control equations. Workload (k) Dispatcher 1 (k) r 11 (k) 2 (k) 3 (k) Dispatcher n 11 (k) n 1m (k) n 21 (k) n 2m (k) n 31 (k) n 3m (k) r 1m (k) r 21 (k) r 2m (k) r 31 (k) r 3m (k) Sleep Dispatcher Dispatcher Silver Gold Bronze Fig. 1. The system model comprising the Gold, Silver and Bronze service clusters and a Sleep cluster holds machines in a powered-off state 0 500 1000 1500 2000 2500 0 200 400 600 800 1000 1200 1400 Time Instance Arrival Rate Per 30 Second Interval 1998 World Cup HTTP Requests Gold Workload Silver Workload Bronze Workload Fig. 2. An example workload representing client requests for the three online services hosted by the computing system Simulations using workload traces from the 1998 World Cup Soccer web site (WC’98) show that a computing system managed by a control framework using approximation models realizes profit gains that are in the best case within 1% of a controller using a parametric model based upon first-principles while incurring low control overhead. II. SYSTEM MODEL We assume a distributed computing environment (DCE) hosting three independent online services, labeled as “Gold”, “Silver”, and “Bronze” and indexed using i ∈{1, 2, 3} as shown in Fig.1. Requests for the Gold, Silver, and Bronze ser- vices arrive with time-varying rates λ 1 (k), λ 2 (k), and λ 3 (k), respectively, and are routed to a computer cluster dedicated to hosting that service. Fig. 2 shows an example workload arrival pattern. Each cluster comprises heterogeneous comput- ers with different processing capacities working independently to service incoming requests. Computers contributing excess capacity during periods of slow workload arrivals are powered down and placed in the Sleep cluster to reduce system power consumption. The Gold, Silver, and Bronze services generate revenue as per a pricing structure in which the response time of a completed request is translated into a dollar amount to be collected from the client. When the response time violates the SLA, the service provider pays a penalty to the client. Fourth International Conference on Autonomic Computing (ICAC'07) 0-7695-2779-5/07 $20.00 © 2007 Authorized licensed use limited to: NEC Labs. Downloaded on May 4, 2009 at 18:46 from IEEE Xplore. Restrictions apply.