16 IEEE SOFTWARE Published by the IEEE Computer Society 0740-7459/10/$26.00 © 2010 IEEE W hat if someone argued that one of your basic conceptions about how to de- velop software was misguided? What would it take to change your mind? That’s essentially the dilemma faced by advocates of test-driven de- velopment (TDD). The TDD paradigm argues that the basic cycle of developing code and then testing it to make sure it does what it’s supposed to do—something drilled into most of us from the time we began learning software development— isn’t the most effective approach. TDD replaces the traditional “code then test” cycle. First, you develop test cases for a small increment of func- tionality; then you write code that makes those tests run correctly. After each increment, you refactor the code to maintain code quality. 1 TDD proponents assert that frequent, incre- mental testing not only improves the delivered code’s quality but also generates a cleaner design. If you haven’t already tried TDD, what data might convince you to try radically changing your soft- ware development approach to get those beneits? Would the experience of a recognized expert help? In this column, we offer both data regarding TDD’s effectiveness and the critique of an expert based on applying it in the ield. Compiling the Evidence Our data comes from a study conducted by ive of us—namely, Burak Turhan, Lucas Layman, Mad- eline Diep, Forrest Shull, and Hakan Erdogmus. 2 The study was based on a systematic literature review to aggregate demonstrated evidence about TDD’s effectiveness. The review searched the lit- erature from 1999, looking for any study that provided some quantitative assessment of TDD’s effectiveness compared to traditional software development. The search results were iltered for quality, which left 22 published articles that de- scribed 33 unique studies. The review distinguished three types of studies: Controlled experiments compared TDD to traditional development under controlled con- ditions to minimize the effects of confound- ing factors, such as developer experience or the type of software being developed. Pilot studies reported comparisons under somewhat realistic conditions but tended to be of short duration or on small problems. Industry studies reported comparisons regard- ing TDD’s effectiveness on real projects being developed for a customer under real commer- cial pressures. Reasoning that more rigorous studies might be fewer in number but should be more trustworthy, the reviewers deined a category of “high rigor” studies that met the following conditions: The subjects included only graduate students or professionals—that is, people who are more experienced than the general population and who should behave the most like developers in industry or government organizations. The study used a TDD process description that matched the textbook deinition and Forrest Shull, Grigori Melnik, Burak Turhan, Lucas Layman, Madeline Diep, and Hakan Erdogmus What Do We Know about Test-Driven Development? voice of evidence Editor: Forrest Shull Fraunhofer Center for Experimental Software Engineering, Maryland fshull@fc-md.umd.edu