XML Externalization Built into Compiler Front-Ends Using a Parser Generator Kazuaki Maeda * Abstract—This paper describes XML externaliza- tion built into compiler front-ends and its applica- tion to quick reverse engineering tool development. A parser generator MoJay was developed to build XML externalization functionality into compiler front-ends. After replacing the original parser generator with MoJay, generating a parser using it, and modifying a few lines of source code in the compiler, we were able to obtain a special compiler that externalizes three types of information in the form of XML docu- ments, namely, lexical information, syntactic informa- tion, and parse tree. The syntactic information was applied to develop a reverse engineering tool for C#. The tool shows a performance penalty from the view- point of the generated XML document size. However, even with a storage penalty, the quick development is a far superior option. Index Terms — parser generator, reverse engineer- ing tool, XML, C# 1 Introduction The growth in computing power and the proliferation of the Internet have made XML a very popular tool for the representation and exchange of data. Today, the use of XML has spread across many fields of applications. For example, it is used for setting application configurations, storing data in databases, retrieving data from databases, exchanging data over the Internet, invoking remote meth- ods, et al. XML is a markup language derived from the standard generalized markup language (SGML), and it is designed to be a text-based, human-readable, and self-describing language. It is independent of all platforms; hence, it can be used across different computers, different operating systems, and different programming languages. The specification of XML does not restrict any specific libraries to process XML documents. If the libraries con- form to XML standards, any tools based on the libraries * This research was partly supported by a grant of the Open Research Center Promotion Project from Ministry of Ed- ucation, Culture, Sports, Science and Technology in Japan. The contact information of the author is Department of Busi- ness Administration and Information Science, Chubu University, 1200 Matsumoto, Kasugai, Aichi 487-8501, Japan, Tel: +81-568- 51-1111, Fax: +81-568-52-1505, Email: kaz@acm.org. can read, analyze, and write XML documents. In order to process XML documents, many libraries have already been implemented for a majority of the programming lan- guages. The orientation of XML documents is generally either document-centric or data-centric[1, 2]. The aim of the document-centric XML documents is visual consump- tion, and hence, they have less structured characteris- tics. Books, articles, and emails are the typical examples of document-centric XML documents. XHTML is a lan- guage to describe web pages as document-centric XML documents. In contrast, data-centric XML documents typically in- clude very granular collections of data, and they are ap- plied to computer processing and database storage. For example, bibliography data and order forms are typical examples of data-centric XML documents. The data ex- changed with web services is mostly data-centric. Here- after, a data-centric XML document will be referred to as “XML data” in this paper. The compiler is a traditional basic software that is in- dispensable for developing software. The main purpose of a compiler is only the generation of efficient object code. However, there are rare cases where a compiler is used for different purposes from the code generation. The compiler includes excellent algorithms and valuable information based on the results of years of research. This paper describes XML externalization built into compiler front-ends by using a parser generator MoJay and its ap- plication to the quick development of a reverse engineer- ing tool. In section 2, the modification of a free and open source compiler and a reverse engineering tool will be explained. In section 3, the XML representation of information in compiler front-ends will be explained. In section 4, the quick development of a reverse engineering tool using XML will be explained. The final section is the summary of this paper. IAENG International Journal of Computer Science, 34:1, IJCS_34_1_20 ______________________________________________________________________________________ (Advance online publication: 15 August 2007)