Author's personal copy Information Processing Letters 111 (2011) 178–183 Contents lists available at ScienceDirect Information Processing Letters www.elsevier.com/locate/ipl Offline file assignments for online load balancing Paul Dütting a , Monika Henzinger b , Ingmar Weber c, a Ecole Polytechnique Fédérale de Lausanne (EPFL), Station 14, CH-1015 Lausanne, Switzerland b University of Vienna, Faculty of Computer Science, Universitaetsstrasse 10, A-1090 Wien, Austria c Yahoo! Research Barcelona, Av. Diagonal 177, E-08003 Barcelona, Spain article info abstract Article history: Received 20 August 2009 Received in revised form 19 July 2010 Accepted 25 November 2010 Available online 30 November 2010 Communicated by Wen-Lian Hsu Keywords: Load balancing On-line algorithms Information retrieval File distribution We study a novel load balancing problem that arises in web search engines. The problem is a combination of an offline assignment problem, where files need to be (copied and) assigned to machines, and an online load balancing problem, where requests ask for specific files and need to be assigned to a corresponding machine, whose load is increased by this. We present simple deterministic algorithms for this problem and exhibit an interesting trade-off between the available space to make file copies and the obtainable makespan. We also give non-trivial lower bounds for a large class of deterministic algorithms and present a randomized algorithm that beats these bounds with high probability. 2010 Elsevier B.V. All rights reserved. 1. Introduction In the online load balancing with restricted assignment problem a set of m machines has to execute a sequence of requests, arriving one by one. Every request has a load and must be placed on exactly one of a subset of machines. The assignment of a request to a machine increases the load on this machine by the load of the request. The goal is to minimize the makespan, i.e., the maximum load placed on any machine. When neither the load of each request nor the set of machines a request can be assigned to are under the control of the algorithm, then no online load balancing algorithm can achieve a competitive ratio better than log 2 m, see [1]. In this paper we study the following variant of this problem which consists of two phases. In the offline phase, n files need to be assigned to m identical machines with the possibility, given space, to copy some or all of the files. In the online phase, a sequence of requests arrives. Each re- quest t asks for one file f j and has to be placed on one of the machines m i to which (a copy of) this file was as- * Corresponding author. E-mail address: ingmar@yahoo-inc.com (I. Weber). signed. That machine’s load ML i is then increased by l(t ). FL j denotes the sum of the loads of all the requests for file f j . The goal is still to minimize the makespan, i.e., the maximum machine load ML = max i ML i . In this model the position of the algorithm is strength- ened: by placing the files “intelligently” it can influence the set of machines a request can be assigned to. However, it still does not control the load of each request nor the file the request asks for. Also note that the makespan will generally depend on the number of file copies made by the algorithm. In our model, each machine has s “slots” which can be used to store (copies of) files. We require s n/m because if s < n/mit is impossible to store all of the n files on the m machines. We call an algorithm for both the offline and the on- line phase a dual-phase algorithm. To analyze the quality of a dual-phase algorithm there are two sensible points of reference. First, one could compare ML to the makespan OPT s of the optimal offline 1 algorithm for the same pa- rameter s. Such an analysis emphasizes the optimality gap while caring less about how good the optimal solution 1 By an “offline” algorithm we mean an algorithm which is given the sequence of requests t before assigning files to machines. 0020-0190/$ – see front matter 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.ipl.2010.11.022