A System for Generating Reverse Engineering Tools zyx G. Canfora”, A. De Lucia* and G. A. Di Lucca** (canforddeluciddilucca) @nadis.dis.unina.it zyxw * DIIIE - Dept. of “Ingegneria dell’Informazione ed Ingegneria Elettrica” University of Salemo, Faculty of Engineering at Benevento Palazzo Bosco Lucarelli, Piazza Roma, 82100, Benevento, Italy ** DIS - Dept. of “Informatica e Sistemistica” University of Naples “Federico 11” via Claudio 21,80125, Naples, Italy Abstract zyxwvuts Most current reverse engineering techniques start with an analysis zyxwvutsr of the system’s source code to derive structural information, based on compiler technology. As a consequence of the maturity of the field, several formal program models exist that have allowed the automatic generation of language processing front-end. However, the software engineer has to code the data structures that implement the program model and the algorithms that implement the desired analysis. Thus, while the domain of code analysis is well understood, economic convenience leads very often to rigid code analysers that perform a fixed set of analyses and produce standard reports that users can only marginally customise. We have implemented a system for developing code analysers that uses a unique database to store both a no- loss fine-grained intermediate representation and the analyses’ results. The analysers are automatically generated from a very high-level specification of the desired analyses expressed in a domain-oriented language. We use an algebraic representation, called F( zyxwvutsrqpo p), as the user-visible intermediate representation. Analysers are specified in a logic-based language, called F(p) - zyxw e, which allows the specification of an analysis in the form of a traversal of an algebraic expression, with accesses to, and stores OJ; the database information the algebraic expression indexes. A foreign language integace allows the analysers to be embedded into C programs to facilitate interoperation with other tools. ~~ This work is supported by “Progetto Strategic0 CINI-CNR Informatica nella Pubblica Amministrazione - Sottoprogetto PROGRESS: PROcess- Guided REengineering Support System” 0-8186-7840-2/97 $10.00 zyxwvutsr 0 1997 IEEE 34 1 Introduction Previous research has shown that code analysis tools are effective in a number of activities of the software maintenance and evolution process. It has also shown that the construction of a robust code analyser may be costly and error prone if dealt with inadequate technologies and tools. 1.1 The problem Reverse engineering has been defined as the process of analysing a subject system to identify the system’s components and their interrelationships and create representations of the system in another form or at a higher level of abstraction [7]. Although reverse engineering may be performed at any stage of the life cycle and starting from any level of abstraction, most current reverse engineering techniques start with an analysis of the system’s source code to derive structural information. To accomplish this, tools apply parsing and controlldata flow analysis technologies based on the lexical, syntactic, and semantic rules that determine the correctness of program constructs in a given programming language. As a consequence of the research conducted in the area of compilers, the domain of code analysis is now well understood and several formal models exist, such as the BNF and the AST, that describe it accurately and exhaustively. These formal models have allowed the implementation of application generators that support the creation of language processing front-end such as the compiler writing facilities Lex and Yacc in the UNIX