Rewrite Systems for Symbolic Evaluation of C-like Preprocessing Mario Latendresse Northrop Grumman IT Technology Advancement Group/FNMOC/U.S. Navy 7 Grace Hopper, Monterey, CA, USA 93943 E-mail: mario.latendresse.ca@metnet.navy.mil Abstract Automatic analysis of programs with preprocessing di- rectives and conditional compilation is challenging. The difficulties range from parsing to program understanding. Symbolic evaluation offers a fundamental and general ap- proach to solve these difficulties. It finds, for every line of code, the Boolean expression under which it is compiled or reached. It can also find all the possible values of prepro- cessing variables (macros) for each line of code. Condi- tional values have been shown an effective representation to do fast practical symbolic evaluation of preprocessing; but their interaction with macro expansion and evaluation has not been formally investigated. We present convergent rewrite systems over conditional values that can interact with macro expansion and evaluation and transform them into Boolean expressions. Once transformed, well known simplification techniques for Boolean expressions can be applied. This entails a more complete solution to the effi- cient symbolic evaluation of C-preprocessing using condi- tional values. 1 Introduction Textual preprocessors similar to cpp might be consid- ered obsolete and ill-designed tools, but they are still widely used in practice from small to large software projects. C- like preprocessing, as described by ANSI C, and imple- mented by cpp, is a de facto approach for preprocessing not only for C but also for programming languages as di- verse as Fortran and Haskell. Moreover the design of some textual preprocessors are similar to cpp [7, 11]. Many researchers [20, 8, 19, 16, 5, 6, 9] have described some of the difficulties of code analysis, maintenance and refactoring in the presence of such preprocessing. Indeed, conditional compilation, free preprocessing variables and macro expansion bring difficulties at many levels, from parsing to program understanding. Several refactoring and visualization tools [20, 1, 15, 17, 12, 16] are based on ad hoc control-flow analyses of preprocessing—they would benefit from a precise and complete (non-abstract) control-flow analysis of conditional compilation in the presence of macro expansion. As far as we know, all solutions to handle program analy- sis in the presence of preprocessing are based on heuristics; and they are often based on partial parsing. These solutions may be good enough for certain specific problems, but they still leave open a general approach capable of handling pre- cisely the semantics of C-like preprocessing. A precise, non-abstract, symbolic evaluation of C-like preprocessing is a promising approach since such concrete preprocessing is not Turing complete 1 . In [14] a symbolic evaluation technique was presented to provide a fundamental solution to automatic analysis of pre- processed code. It does not require the code to be parseable by a context-free grammar. Its direct goal is to find, for ev- ery line of code, the condition under which it is compiled or reached. It also provides, for every line of code, the pos- sible values—under guarded Boolean expressions—of pre- processing variables 2 . Given such information, further analysis or transforma- tions can be done. For example, all the statically dead code could be removed 3 , all possible macro values at every line of code can be found, refactoring operations such as renam- ing of variables can be made precise, etc. This symbolic evaluation uses conditional values (c- values) 4 . They were shown, in [14], to be effective to avoid 1 This can easily be proven, since ANSI C preprocessing only possible form of iteration is the #include mechanism, which has an nested depth constraint. 2 In practice, such information is not kept for every line of code but for every segment of code which is a sequence of lines without preprocessing conditionals. 3 This should not always be done, since some parts of the code might intentionally be cut out temporarily. 4 We might call them conditional expressions as they may be based on preprocessing variables and operators but since they are only bound to pre- processing variables we prefer the term ‘value’.