Using Simulation to Design Scalable and Cost-Efficient Archival Storage Systems James Byron 1 Darrell D. E. Long 1 Ethan L. Miller 1,2 Center for Research in Storage Systems, University of California, Santa Cruz 1 Pure Storage 2 {jbyron,darrell,elm}@ucsc.edu Abstract—The need for reliable and cost-effective data storage grows as digital information becomes increasingly ubiquitous. Archival systems must store valuable data for years while adapting to changing user needs, capacity, and performance requirements. Storage devices differ in terms of performance, capacity, reliability, acquisition cost, power consumption, and the rates at which their features change over time. As a result, choosing the best storage technology to use for an archive has become increasingly challenging with the proliferation of new technologies alongside existing ones. We have designed a simulator that models the capacity, performance, acquisition cost, and power cost of an archival system using the characteristics of the drives and media that comprise it. We simulate and compare four storage technologies that exhibit different cost and performance characteristics: tape, optical disc, hard disk, and NAND flash SSD. We evaluate the total cost of ownership for each storage technology within an archival system, and we explore the effect that prospective technological advancements and growth rates over time may have on the relative cost and viability of each storage technology for archival systems. We show that the lifecycle and upgrade cost of drives are significant cost factors for removable media archives. We observe that increasing performance requires adding more drives to an archival system, and the cost of each drive dominates the cost to increase performance. We compare trends in storage technologies to suggest developments that could minimize the long-term total cost of ownership for archival systems. We show that hard disks and flash could become cost-competitive with tape-based archives by adopting new designs to minimize infrastructure and electricity costs. Index Terms—archival storage, simulation, total cost of own- ership, performance, power consumption I. I NTRODUCTION Storage technologies vary widely in terms of their per- formance, cost, power consumption, reliability, and pace of development over time. With a growing need for low cost data storage in archival systems, the selection of a good archival technology can yield significant long-term cost savings and better performance. We present an archival storage simulator that utilizes trends in storage technology development to predict the long-term cost of archiving data using different storage technologies. We also analyze the best and worst case scenarios for each technology to understand how potential breakthrough improvements might impact archival system design over time. This research was supported in part by the National Science Foundation under award IIP-1266400 and industrial members of the Center for Research in Storage Systems. The information age has given rise to large digital storage systems that record vast amounts of valuable information. De- mand predictions suggest a 30% compound annual growth rate (CAGR) through 2025 in the amount of digital information that must be stored, much of which may have significant financial or personal value [1]. In order to minimize the cost of reliably storing large amounts of data for long periods of time, archival systems must utilize cost-effective and power-efficient storage devices. We compare the features and cost of tape, optical disc, hard disk drives (HDDs) and solid state drives (SSDs). We show that tape and optical disc become more expensive than hard disk as the performance requirements for an archival system increase. Tape and optical disc archives are most cost-effective in archives with minimal performance requirements. HDDs and SSDs offer high performance at somewhat higher total cost than tape or disc. We show that a simple and low-cost network attached storage adapter for HDDs and SSDs can reduce their cost in an archive. We demonstrate that increasing the longevity of HDDs will slightly reduce their cost in long term archival storage. Optical disc archives may be cost-effective for data that requires infrequent access and few changes. We describe the relationship between slow performance and high power consumption. Finally, we predict the future cost of archival storage as the rate for development slows for each storage technology. II. APPROACH The economics of long-term storage, including both the value of data and the cost to store it, is an important factor for long-term data preservation [2], [3]. Archival systems must achieve high capacity, performance, and reliability with a low total cost of ownership (TCO). The economic value of archival systems may be expressed as a function: V Archive = V d - TCO(d, time). (1) V d is the economic value of the data d for the remainder of all time. We assume V d remains constant in different archival systems; however, different levels of performance can influence V d . For example, in some situations, V d may increase if the archive can access data within a certain number of seconds. We leave a study of this to future work. TCO(d, time) is a function that calculates the total cost to store and maintain the data over the period specified by time. A positive value for