Automatically Finding Performance Problems with Feedback-Directed Learning Software Testing Mark Grechanik Accenture Technology Lab and U. of Illinois, Chicago Chicago, IL 60601 drmark@uic.edu Chen Fu, Qing Xie Accenture Technology Lab Chicago, IL 60601 {chen.fu, qing.xie}@accenture.com Abstract—A goal of performance testing is to find situations when applications unexpectedly exhibit worsened characteris- tics for certain combinations of input values. A fundamental question of performance testing is how to select a manageable subset of the input data faster to find performance problems in applications automatically. We offer a novel solution for finding performance problems in applications automatically using black-box software testing. Our solution is an adaptive, feedback-directed learning testing system that learns rules from execution traces of applications and then uses these rules to select test input data automatically for these applications to find more performance problems when compared with exploratory random testing. We have implemented our solution and applied it to a medium-size ap- plication at a major insurance company and to an open-source application. Performance problems were found automatically and confirmed by experienced testers and developers. I. I NTRODUCTION A goal of performance testing is to find performance prob- lems, when an application under test (AUT) unexpectedly exhibits worsened characteristics for a specific workload [1], [2]. For example, effective test cases for load testing, which is a variant of performance testing, find situations where an AUT suffers from unexpectedly high response time or low throughput [3], [4]. Test engineers construct performance test cases, and these cases include actions (e.g., interacting with GUI objects or invoking methods of exposed interfaces) as well as input test data for the parameters of these methods or GUI objects [5]. It is difficult to construct effective performance test cases that can find performance problems in a short period of time, since it requires test engineers to test many combinations of actions and data for nontrivial applications. Depending on input values, an application can exhibit different behaviors with respect to resource consumption. Some of these behaviors involve intensive computations that are characteristic of performance problems [6]. Naturally, testers want to summarize the behavior of an AUT concisely in terms of its inputs, so that they can select input data that will lead to significantly increased resource consump- tion thereby revealing performance problems. Unfortunately, finding proper rules that collectively describe properties of such input data is a highly creative process that involves deep understanding of input domains [7, page 152]. Descriptive rules for selecting test input data play a significant role in software testing [8], where these rules approximate the functionality of an AUT. For example, a rule for an insurance application is that some customers will pose a high insurance risk if these customers have one or more prior insurance fraud convictions and deadbolt locks are not installed on their premises. Computing insurance premium may consume more resources for a customer with a high- risk insurance record that matches this rule versus a customer with an impeccable record, since processing this high-risk customer record involves executing multiple computationally expensive transactions against a database. Of course, we use this example of an oversimplified rule to illustrate the idea. Even though real-world systems exhibit much more complex behavior, useful descriptive rules often enable testers to build effective performance fault revealing test cases. We offer a novel solution for Feedback-ORiEnted Per- fOrmance Software Testing (FOREPOST) for finding perfor- mance problems automatically by learning and using rules that describe classes of input data that lead to intensive computations. FOREPOST is an adaptive, feedback-directed learning testing system that learns rules from AUT execution traces and uses these learned rules to select test input data automatically to find more performance problems in applications when compared to exploratory random perfor- mance testing [9], [10]. FOREPOST uses runtime monitor- ing for a short duration of testing together with machine learning techniques and automated test scripts to reduce large amounts of performance-related information collected during AUT runs to a small number of descriptive rules that provide insights into properties of test input data that lead to increased computational loads of applications. This paper makes the following contributions. FOREPOST collects and utilizes execution traces of the AUT to learn rules that describe the computational intensity of the workload in terms of the properties of input data. These rules are used by the adaptive automated test script automatically, in a feedback loop to steer the execution of the AUT by selecting input