How to Test Program Generators? A Case Study using flex Prahladavaradan Sampath A. C. Rajeev K. C. Shashidhar S. Ramesh General Motors India Science Lab Bangalore {p.sampath, rajeev.c, shashidhar.kc, ramesh.s}@gm.com Abstract We address the problem of rigorous testing of program generators. Program generators are software that take as input a model in a certain modeling language, and produce as output a program that captures the execution semantics of the input-model. In this sense, program generators are also programs and, at first sight, the traditional techniques for testing programs ought to be applicable to program gen- erators as well. However, the rich semantic structure of the inputs and outputs of program generators poses unique challenges that have so far not been addressed sufficiently in the testing literature. We present a novel automatic test- case generation method for testing program generators. It is based on both syntax and semantics of the modeling lan- guage, and can uncover subtle semantic errors in the pro- gram generator. We demonstrate our method on flex,a prototypical lexical analyzer generator. 1. Introduction Program generators are programs that generate other programs. They take as input a model in a certain mod- eling language, and produce as output an implementation that captures the execution semantics of the input-model. They play a critical role in addressing the increasing com- plexity of modern software engineering [23]. Some of the traditional areas where program generators have been ap- plied include syntactic analysis, program compilation and program optimization. Apart from these traditional areas, program generators, and in general, model-processors – tools that process input-models to obtain output-models, are increasingly being used in software engineering practice. Some of the applications that make essential use of such model-processors include model-based software engineer- ing, aspect-based programming, compiler generation, etc. Industrial strength program generators implement a complex functionality. Their design and implementation requires a thorough understanding of the syntactic and se- mantic aspects of their input modeling language. In addi- tion to their complexity, they are also subject to evolution to meet the changing demands of software engineering prac- tice. This implies that the correctness of their functionality cannot be taken for granted. Indeed, it is becoming increas- ingly important to gain confidence in their correctness be- fore using them in an industrial project, in particular, when they are used in the development of safety critical applica- tion software. Software testing is the most viable technique for gain- ing confidence in the correctness of programs, and auto- mated test-generation (ATG) methods are essential to make it effective. However, testing a program generator is about ensuring that, given a model and its implementation, the semantics of the model is captured faithfully in the imple- mentation. Therefore, an ATG method required for such a testing will have to deal with the rich syntactic and semantic structures of models unlike a traditional ATG method that typically deals only with simpler input and output domains. As an example, suppose that the program generator to be tested is a lexical analyzer generator (LAG). The test- cases required here are the regular expression lists that are used as inputs to the LAG, along with the strings that belong to the language accepted by the regular expressions in the list. Therefore, the requirement on an ATG method is that it should produce a suite of valid input-models, and test-inputs for these models. Some ATG methods have been proposed in the literature for testing a lexical analyzer; for example, grammar-based testing, which takes as input a grammar de- scribing a language, and generates strings in the language accepted by the grammar. However, our problem is differ- ent – we wish to have an ATG method for a LAG – the program that generates the lexical analyzer. In this paper, we present such a method for rigorously testing program