Are Fit Tables Really Talking? A Series of Experiments to Understand whether Fit Tables are Useful during Evolution Tasks Filippo Ricca 1 , Massimiliano Di Penta 2 , Marco Torchiano 3 Paolo Tonella 4 , Mariano Ceccato 4 , Corrado Aaron Visaggio 2 1 Unità CINI at DISI, Genova, Italy 2 RCOST – Dept. of Engineering – University of Sannio, Benevento, Italy 3 Politecnico di Torino, Italy 4 Fondazione Bruno Kessler–IRST, Trento, Italy ﬁlippo.ricca@disi.unige.it, dipenta@unisannio.it, torchiano@polito.it, tonella@fbk.eu, ceccato@fbk.eu, visaggio@unisannio.it Test-driven software development tackles the problem of opera- tionally deﬁning the features to be implemented by means of test cases. This approach was recently ported to the early development phase, when requirements are gathered and clariﬁed. Among the existing proposals, Fit (Framework for Integrated Testing) supports the precise speciﬁcation of requirements by means of so called Fit tables, which express relevant usage scenarios in a tabular format, easily understood also by the customer. Fit tables can be turned into executable test cases through the creation of pieces of glue code, called ﬁxtures. In this paper, we test the claimed beneﬁts of Fit through a series of three controlled experiments in which Fit tables and related ﬁx- tures are used to clarify a set of change requirements, in a software evolution scenario. Results indicate improved correctness achieved with no signiﬁcant impact on time, however beneﬁts of Fit vary in a substantial way depending on the developers’ experience. Prelimi- nary results on the usage of Fit in combination with pair program- ming revealed another relevant source of variation. Categories and Subject Descriptors D.2.1 [Requirements/Speciﬁcations]: Methodologies, Tools General Terms Experimentation, Measurement Keywords Empirical studies, Acceptance test, Software Maintenance 1. INTRODUCTION When specifying requirements and change requests in natural language, analysts have to avoid several “sins” [9] that may bring Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. ICSE’08, May 10–18, 2008, Leipzig, Germany. Copyright 2008 ACM 978-1-60558-079-1/08/05 ...$5.00. about interpretation problems between analysts and developers. So- me of them are noise, i.e., information not relevant to the problem or a repetition in the requirements, silence, when important infor- mation is missing, or over-speciﬁcation, when portions of the so- lution are mentioned in the requirements. A substantial proportion of code defects, as high as 85%, originates at the requirement elic- itation phase [18], both for initial requirements and for change re- quests, during software evolution. The root cause for such defects can be associated with ambiguous, incomplete, inconsistent, silent (unexpressed), unusable, over-speciﬁc or verbose requirements [9]. Test-driven development advocates a central role for testing and test cases, used to capture the features to be implemented in a form that can be checked automatically through execution. Unit test cases show the development progress for single modules. Similarly, ex- ecutable acceptance test cases have been proposed to measure and describe precisely the level of progress in the implementation of the initial requirements or change requests. According to the agile methodologies [8], acceptance test cases are deemed more precise and accurate sources of information about the customer’s require- ments than their description in natural language. Acceptance test cases are “talking” representation of the requirements, which can be consulted whenever ambiguities or misinterpretations may arise. Among the technologies for supporting automated acceptance testing, Fit (Framework for Integrated Test) [10] is one of the most popular and widely used. Fit helps analysts write acceptance tests by means of simple HTML tables (Fit tables), including input and expected output for each test scenario. Different kinds of tables are used for different testing conditions, e.g., testing the output for a given sequence of values vs. testing the result of a sequence of actions. Developers write glue code (called ﬁxtures) to link the test cases expressed in the Fit tables with the system under devel- opment. Once ﬁxtures are available, the test runner can execute them, comparing Fit table data (expected output) with actual val- ues obtained from the execution. In this paper, we measure the effects of the adoption of Fit in a series of three controlled experiments, varying by subjects in- volved and working conditions. We evaluated the usage of Fit with subjects having different levels of programming skills and we con- sidered, in one replication of the experiment, subjects working in pairs (pair programming). This study has two objectives: on the one hand, we want to empirically evaluate the effects of Fit on the clariﬁcation of requirements, in terms of correctness of the result- ing code. On the other hand, we want to also evaluate the impact of 361