Design and Analysis of a Load Balancing Strategy in Data Grids Xiao Qin Department of Computer Science New Mexico Institute of Mining and Technology Socorro, NM 87801, Email: xqin@cs.nmt.edu Abstract Developing Data Grids has increasingly become a major concern to make Grids attractive for a wide range of data-intensive applications. Storage subsystems are most likely to be a performance bottleneck in Data Grids and, therefore, the focus of this paper is to design and evaluate a data-aware load-balancing strategy to improve the global usage of storage resources in Data Grids. We build a model to estimate the response time of job running at a local site or remote site. In light of this model, we can calculate slowdowns imposed on jobs in a Data Grid environment. Next, we propose a load-balancing strategy that aims to balance load of a Data Grid in such a judicious way that computation and storage resources in each site are simultaneously well utilized. We conduct experiments using a simulated Data Grid to analyze the performance of the proposed strategy. Experimental results confirm that our load-balancing strategy can achieve high performance for data-intensive jobs in Data Grid environments. Key words: load balancing, data-intensive, datagrids, performance evaluation PACS: 1 Introduction In the last decade, Data Grids have increasingly become popular for a wide range of scientific and commercial applications [1]. A Data Grid can be envi- sioned as a collection of geographically dispersed computing and storage re- sources that provide a diversity of services [2][3] to fit needs of data-intensive ⋆ Article published in Future Generation Computer Systems: The Int’l Journal of Grid Computing, vol. 23, no. 1, pp. 132-137, Jan. 2007.