Fault Detection Likelihood of Test Sequence Length Fevzi Belli, Michael Linschulte University of Paderborn, Germany e-mail: {belli, linschu}@upb.de Christof J. Budnik Siemens Corporation, Corporate Re- search, Princeton, NJ 08540, USA e-mail: christof.budnik@siemens.com Harald A. Stieber University of Applied Sciences Nuremberg, Germany e-mail: harald.stieber@ohm-hochschule.de Abstract— Testing of graphical user interfaces is important due to its potential to reveal faults in operation and perfor- mance of the system under consideration. Most existing test approaches generate test cases as sequences of events of dif- ferent length. The cost of the test process depends on the number and total length of those test sequences. One of the problems to be encountered is the determination of the test sequence length. Widely accepted hypothesis is that the longer the test sequences, the higher the chances to detect faults. However, there is no evidence that an increase of the test se- quence length really affect the fault detection. This paper in- troduces a reliability theoretical approach to analyze the problem in the light of real-life case studies. Based on a relia- bility growth model the expected number of additional faults is predicted that will be detected when increasing the length of test sequences. Keywords: Software Testing, Graphical User Interfaces, Event Sequence Graphs, Software Reliability I. INTRODUCTION Graphical user interfaces (GUIs) add up to half or more of the source code in software [1]. Testing GUIs is a diffi- cult and challenging task for many reasons: First, the input space possesses a potentially indefinite number of combina- tions of events. Second, even simple GUIs possess an enormous number of states due to interaction with the in- puts. Last but not least, many complex dependencies may hold between different states of the GUI system, and be- tween its states and inputs. Test inputs of GUI usually represent sequences of graphical object activities and/or se- lections that will operate interactively with the objects such as Interaction Sequences and Event Sequences in [2, 11, 13, 15]. While testing, the crucial decision is when to stop testing (test termination problem) [1, 3, 13]. Exercising a set of test cases, test results can be satisfying, but be limited to these special test cases. Thus, for the quality judgment of a system under consideration (SUC) further quantitative arguments are needed, usually realized by well-defined coverage crite- ria. Most of the existing approaches are based on test se- quences to be covered when testing GUI [4, 15]. The present paper analyzes the dependency of the fault detection from the length of test sequences. Thus, the question we attempt to answer is: To what extent does the likelihood increase to detect faults if the length of the test sequences is increased? To answer this question, the analysis considered follow- ing aspects: Suitable software reliability growth models are se- lected by their appropriateness for predicting the expected number of additional faults that will be de- tected when increasing the length of test sequences. For our experiments the length of sequences varied from 2 to 4, defining three groups of test sets which needed special care for estimating model parame- ters. The data used for the reliability analysis performed in this paper are borrowed from our previous paper [4] which presented event sequence graphs (ESG) as testing approach enabling testing with different length of sequences. ESGs, similar to the concept of event flow graphs [15, 16], are used for analysis and validation of user interface requirements prior to implementation and testing of the code. The present paper chooses ESG notation [2, 3] because it intensively uses formal, graph-theoretical notions and algorithms which are developed independently from and prior to event flow graphs. Related work is summarized in the next section. Sec- tion 3 introduces the terminology and notion used in our ap- proach, and discusses various aspects of software reliability determination. Section 4 reports on two case studies. The first one is performed on a public domain software system for personal music management. The second one is per- formed on a large commercial online touristic reservation system called ISELTA. Reliability analysis of the results is carried out in Section 5. Section 6 concludes the paper summarizing the results and outlines our research work planned. II. RELATED WORK Methods based on finite-state automata have been used for long when modeling and validating complex systems, e.g., for conformance testing [8, 13, 18], as well as for speci- fication and testing of system behavior [1, 2, 4, 13]. Numer- ous methods for GUI testing, including convincing empiri- cal studies to validate the approaches have been introduced in [9, 15, 16, 19]. These methods are quite different from the combinatorial ones, e.g., pairwise testing, which requires that for each pair of input parameters of a system, every combination of these parameters' valid values must be cov- ered by at least one test case [21]. A different approach for GUI testing has been intro- duced in [16] which deploys methods of knowledge engi- 2010 Third International Conference on Software Testing, Verification and Validation 978-0-7695-3990-4/10 $26.00 © 2010 IEEE DOI 10.1109/ICST.2010.51 402