Reliable Effects Screening: A Distributed Continuous Quality Assurance Process for Monitoring Performance Degradation in Evolving Software Systems Cemal Yilmaz, Member, IEEE Computer Society, Adam Porter, Senior Member, IEEE, Arvind S. Krishna, Atif M. Memon, Member, IEEE Computer Society, Douglas C. Schmidt, Aniruddha S. Gokhale, and Balachandran Natarajan Abstract—Developers of highly configurable performance-intensive software systems often use in-house performance-oriented “regression testing” to ensure that their modifications do not adversely affect their software’s performance across its large configuration space. Unfortunately, time and resource constraints can limit in-house testing to a relatively small number of possible configurations, followed by unreliable extrapolation from these results to the entire configuration space. As a result, many performance bottlenecks escape detection until systems are fielded. In our earlier work, we improved the situation outlined above by developing an initial quality assurance process called “main effects screening.” This process 1) executes formally designed experiments to identify an appropriate subset of configurations on which to base the performance-oriented regression testing, 2) executes benchmarks on this subset whenever the software changes, and 3) provides tool support for executing these actions on in-the-field and in-house computing resources. Our initial process had several limitations, however, since it was manually configured (which was tedious and error-prone) and relied on strong and untested assumptions for its accuracy (which made its use unacceptably risky in practice). This paper presents a new quality assurance process called “reliable effects screening” that provides three significant improvements to our earlier work. First, it allows developers to economically verify key assumptions during process execution. Second, it integrates several model- driven engineering tools to make process configuration and execution much easier and less error prone. Third, we evaluate this process via several feasibility studies of three large, widely used performance-intensive software frameworks. Our results indicate that reliable effects screening can detect performance degradation in large-scale systems more reliably and with significantly less resources than conventional techniques. Index Terms—Distributed continuous quality assurance, performance-oriented regression testing, design-of-experiments theory. Ç 1 INTRODUCTION T HE quality of service (QoS) of many performance- intensive systems, such as scientific computing systems and distributed real-time and embedded (DRE) systems, depends heavily on various environmental factors. Example dependencies include the specific hardware and operating system on which systems run, installed versions of middleware and system library implementations, available language processing tools, specific software features that are enabled/disabled for a given customer, and dynamic workload characteristics. Many of these dependencies are not known until deployment and some change frequently during a system’s lifetime. To accommodate these dependencies, users often need to tune infrastructure and software applications by (re)adjust- ing many (i.e., dozens to hundreds) of compile-time and runtime configuration options that record and control variable software parameters. These options are exposed at multiple system layers, including compiler flags and operating system, middleware, and application feature sets and runtime optimization settings. For example, there are  50 configuration options for SQL Server 7.0,  200 initialization parameters for Oracle 9, and  90 core configuration options for Apache HTTP Server Version 1.3. Although designing performance-intensive systems to include such configuration options promotes code reuse, enhances portability, and helps end users improve their QoS, it also yields an enormous family of “instantiated” systems, each of which might behave differently and, thus, may need quality assurance (QA). The size of these system families creates serious and often under-appreciated chal- lenges for software developers, who must ensure that their IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 33, NO. 2, FEBRUARY 2007 1 . C. Yilmaz is with the IMB T.J. Watson Research Center, 19 Skyline Dr., Hawthorne, NY 10532. E-mail: cyilmaz@us.ibm.com. . A. Porter and A.M. Memon are with the Department of Computer Science, University of Maryland, College Park, MD 20742. E-mail: {aporter, atif}@cs.umd.edu. . A.S. Krishna, D.C. Schmidt, A.S. Gokhale, and B. Natarajan are with the Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37325. E-mail: arvindkr@qualcomm.com, schmidt@dre.vanderbilt.edu, a.gokha- le@vanderbilt.edu. . B. Natarajan is with Symantec, ?MAILING ADDRESS?. E-mail: bala_natrajan@symantec.com. Manuscript received 14 Dec. 2005; revised 26 July 2006; accepted 13 Nov. 2006; published online 28 Dec. 2006. Recommended for acceptance by B. Littlewood. For information on obtaining reprints of this article, please send e-mail to: tse@computer.org, and reference IEEECS Log Number TSE-0331-1205. 0098-5589/07/$20.00 ß 2007 IEEE Published by the IEEE Computer Society