Experimentally Evaluating In-Place Delta Reconstruction Randal Burns Larry Stockmeyer Darrell D. E. Long Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Johns Hopkins Univ. IBM Almaden Research Center Univ. of California, Santa Cruz randal@cs.jhu.edu stock@almaden.ibm.com darrell@cs.ucsc.edu Abstract In-place reconstruction of delta compressed data allows information on devices with lim- ited storage capability to be updated efficiently over low-bandwidth channels. Delta compres- sion encodes a version of data compactly as a small set of changes from a previous version. Transmitting updates to data as delta versions saves both time and bandwidth. In-place re- construction rebuilds the new version of the data in the storage or memory space the current version occupies – no additional scratch space is needed. By combining these technologies, we support large-scale, highly-mobile applications on inexpensive hardware. We present an experimental study of in-place reconstruction algorithms. We take a data- driven approach to determine important performance features, classifying files distributed on the Internet based on their in-place properties, and exploring the scaling relationship between files and data structures used by in-place algorithms. We conclude that in-place algorithms are I/O bound and that the performance of algorithms is most sensitive to the size of inputs and outputs, rather than asymptotic bounds. 1 Introduction We develop algorithms for data distribution and version management to be used for highly-mobile and resource-limited computers over low-bandwidth networks. The software infrastructure for Internet-scale file sharing is not suitable for this class of applications, because it makes demands for network bandwidth and storage/memory space that many small computers and devices cannot meet. While file sharing is proving to be the new prominent application for the Internet, it is limited in that data are not writable nor are versions managed. The many recent commercial and freely available systems underscore this point, examples include Freenet [1] and GnuTella [2]. Writable replicas greatly increase the complexity of file sharing – problems include update propagation and version control. Delta compression has proved a valuable tool for managing versions and propagating up- dates in distributed systems and should provide the same benefits for Internet file sharing. Delta- compression has been used to reduce latency and network bandwidth for Web serving [4, 20] and backup and restore [6]. Our in-place reconstruction technology addresses one of delta compression’s major shortcom- ings. Delta compression makes memory and storage demands that are not reasonable for low-cost, 137