Position: Short Object Lifetimes Require a Delete-Optimized Storage System Fred Douglis† John Palmer‡ Elizabeth S. Richards* David Tao* William H. Tetzlaff‡ John M. Tracey† Jian Yin† † IBM T.J. Watson Research Center ‡ IBM Almaden Research Center *U.S. Department of Defense Abstract Early file systems were designed with the expectation that data would typically be read from disk many times before being deleted; on-disk structures were therefore optimized for reading. As main memory sizes increased, more read requests could be satisfied from data cached in memory, motivating file system designs that optimize write performance. Here, we describe how one might build a storage system that optimizes not only reading and writing, but creation and deletion as well. Effi- ciency is achieved, in part, by automating deletion based on relative retention values rather than requiring data be deleted explicitly by an application. This approach is well suited to an emerging class of applications that pro- cess data at consistently high rates of ingest. This paper explores trade-offs in clustering data by retention value and age and examines the effects of allowing the reten- tion values to change under application control. 1 Introduction We are researching the storage system for a highly scal- able distributed stream processing system, similar to TelegraphCQ [2] or Medusa [16]. Unlike conventional systems that are typically engineered to have sufficient capacity, this system must be designed assuming its ca- pacity is chronically insufficient. This assumption is ap- propriate for certain data mining applications in which the product of the available data and the set of potential mining algorithms dwarf any conceivable set of process- ing resources. In such an environment, the system, or at least its bottleneck resource, is always fully utilized. Disks are typically nearly full and they service an unre- lenting stream of requests. Individual data objects 1 can be of arbitrary size; many will be just a few bytes. While some data will be dis- carded immediately and never make it to secondary stor- age, a substantial amount of data will be written to disk, read once or a small number of times, and then quickly be deleted. Depending on system load and priorities, some data may be deleted before ever being read. A rel- atively small fraction of the input data will be retained for a long time and read repeatedly. In this environment we observe that as file lifetimes become short, and all other things are equal, Little’s Law requires that a fixed- sized storage system will have increasing create/delete rates. Since creates/deletes involve random disk I/O, and disk technology is progressing faster in density than ac- cess rate, this will become increasingly important in the future. Three key notions in the design of our new storage system are immutability, relative valuation, and pipelin- ing. First, data objects are immutable once created. 2 Thus the only operations on objects that involve their data are to write them initially, read them, or delete them. Second, there are additional operations to affect the metadata of an object, particularly its retention value (RV). When an object is created, it is given a current re- tention value (CRV) that indicates the relative importance of keeping the object, and a function defining how the CRV decays over time; objects therefore naturally age out of the system. Third, applications are designed to take objects along a pipeline, often in an arbitrary or- der. Rather than an application requesting a specific ob- ject and suffering the latency of retrieving that object, most applications will be designed to receive a stream of objects, the order of which is dictated by a resource manager. For example, a web crawler that processes re- trieved pages may not care which pages it processes first, only that it processes all recently crawled pages in some order. RVs are only hints to the system about how long to re- tain an object, not absolute guarantees. Thus, unlike tra- ditional file systems that write a file and then ensure the availability of that file until it is deleted or overwritten, our system writes an object and then makes a good-faith effort to retain it in accordance with its specified RV. As objects are processed, their processing can affect the RV of various objects (themselves or others), causing them to be retained for longer or shorter periods. However,