Information Processing Letters 86 (2003) 197–202 www.elsevier.com/locate/ipl Data replication in static tree structures ,✩✩ Susanne E. Hambrusch , Chuan-Ming Liu Department of Computer Sciences, Purdue University, West Lafayette, IN 47907, USA Received 3 June 2002 Communicated by F. Dehne Keywords: Algorithms; Blocknumber; Data replication; Secondary storage; Static tree structures 1. Introduction Replication of data can lead to better performance due to increased availability of data. This paper ex- plores the use of data replication for large, static data sets organized in a tree structure. In static environ- ments, data replication does not have to deal with problems arising from maintaining consistency among copies [1,10,12]. Static data sets arise, for example, when updates on the data happen at fixed times and in bulk, and queries are not present at the time of updating. We show that in static tree structures data replication is effective in reducing the amount of I/O needed measured in terms of the blocknumber. Our techniques determine what data to replicate, how to control the amount of replication, and how to ensure good block utilization. Our results improve previously known bounds. Work supported in part by the NSF under grants 9988339-CCR and 0010044-CCR. ✩✩ A preliminary version of this paper appeared in the Proceed- ings 9th CIKM, 2000. * Corresponding author. E-mail addresses: seh@cs.purdue.edu (S.E. Hambrusch), liucm@cs.purdue.edu (C.-M. Liu). Let T be an N -node rooted tree with r as the root. When the data associated with the nodes of T is too large to store in main memory, nodes of T are mapped to blocks of size at most B . Blocks are stored externally and accessing one block is considered one I/O operation. In this paper we assume that one block can hold the data of B nodes. Hence, at least N/Bblocks are stored externally. Two metrics used to measure the quality of the generated blocks are (i) the number of nodes assigned to blocks and (ii) the blocknumber. When at most one block is assigned fewer than B nodes we refer to the mapping as a complete mapping. Blocks containing fewer than B nodes are undesirable since they underutilize resources. The blocknumber measures external access during a search. Consider a path P from root r to a leaf l in T . The blocknumber is the maximum number of edges (u,v) on path P for which u and v are in different blocks. In this paper we describe complete mappings from the nodes of T to blocks when nodes can be replicated, the amount of replication is controlled, and the block- number is optimized. A node of T can be mapped to more than one block and is thus available in different blocks. A mapping of tree T has a total replication factor τ if the the number of nodes in all blocks is bounded by τN , τ 1. We show that any tree T of 0020-0190/03/$ – see front matter 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0020-0190(02)00503-3