Towards Standardized Benchmarks for Dynamic Software Updating Systems Edward K. Smith, Michael Hicks, Jeffrey S. Foster University of Maryland, College Park {tedks, mwh, jfoster}@cs.umd.edu Abstract—Dynamic Software Updating (DSU) has been an active topic of research for at least the last 30 years. However, despite many recent advances, DSU has yet to see widespread adoption and deployment in practice. In this paper, we review a slice of the history of DSU research to study how DSU for C has evolved over the last two decades. We examine the ways DSU systems are evaluated in the research literature. We identify several shortcomings of the evaluation criteria that have been used, and propose key improvements. We believe that using better evaluation criteria can guide DSU research to produce systems that will be more practical, flexible, and usable. I. I NTRODUCTION Research on dynamically updating running software has been actively underway for at least the past 30 years. Nu- merous dynamic software updating (DSU) systems have been developed, targeting applications [5], [13], [6], [18], [17], [4], [14], [22], [10], [9] and operating systems [15], [2], [12], and ranging from new compilers, to runtime frameworks, to libraries. While these systems vary widely in their implemen- tation, they all allow programs to be upgraded at runtime with minimal interruption to execution, while retaining valuable state. Research systems have progressed significantly since the beginnings of the field, and the pace of research in DSU is accelerating. Recently, the first DSU startup, Ksplice [2], was bought by industry titan Oracle, suggesting the potential of the technology. However, with the exception of Ksplice’s customers, DSU is almost entirely unused in real-world systems. Also surpris- ingly, though many DSU systems [17], [4], [2] have been released as free software [20], the free software community has not adopted any of these systems nor derived any more practical variants. This situation raises the question of why adoption is not more widespread. We believe that to move beyond building systems that only Ph.D. researchers in DSU can use, to building systems that everyday programmers can use, the DSU research community needs to rethink how it decides what properties are desirable for a DSU system to have. In this paper, we make two main contributions toward this end. First, we survey two decades of research on DSU for user-space C programs to help understand how DSU research has reached its current state, and to recall lessons learned over that time. (Section II) Second, we examine the ways that DSU systems have been evaluated in the research literature, and recommend improvements to evaluation strategies to help lead toward more practical DSU systems. Specifically, we propose the creation of a standard benchmark suite, suggest further research into defining the problems of update availability and flexibility, and call for direct usability studies of DSU systems (Section III). We focus our discussion on DSU systems for user-space C programs due to space constraints, but our recommendations apply to DSU for other languages (e.g., Java) as well. II. HISTORY OF DSU While dynamic software updating has existed conceptually since (at least) 1976 [5], DSU has only recently been demon- strated on real-world systems. We begin our survey of DSU for user-space programs in 1991, when the first application of DSU to C was published. A. The 1990’s PODUS [18], [19], the Procedure Oriented Dynamic Up- dating System, is the earliest system we have found that supports updating C programs. It is implemented on SunOS, and purports compatibility with other UNIX variants. The authors demonstrate updating an example program written in C. PODUS uses binary rewriting to effect updates. Binary rewriting is widely used in later updating systems, and involves writing to the code segment of a running program to redirect old function calls to new versions. To attempt to ensure update safety, PODUS only applies an update once no functions to be updated are live on the call stack. This restriction is sufficient (but not necessary) to ensure updates are type-safe. That is, assuming both the old and new programs are themselves type correct, dynamically updating the first to the second will not introduce any type errors. Limiting updates to inactive functions enforces what we call activeness safety. Activeness safety has been adopted by many DSU systems, for C and other languages. Possibly the most significant work on DSU in the 90’s is Gupta’s On-line software version change using state transfer between processes [6]. This system uses a novel mechanism called state transfer to effect an update by transferring some of the state of the running program to a specially prepared variant of the subsequent version of that program, started as a separate process. However, Gupta’s main contribution is a formal notion of update validity. To summarize, an update is valid if it transforms an existing program’s state into a state that could be constructed by executing the new version of a program from the start. In later work [7], Gupta proves