Towards Logic Based Representation of XML Models C˘ alin Jebelean * , Ciprian-Bogdan Chirila * , Vladimir Cret ¸u * and Marieta F˘ as ¸ie * * University Politehnica of Timis ¸oara, Romania Email: {calin,chirila,vcretu}@cs.upt.ro, marietta or@yahoo.com Abstract—Both code analysis and code transformation are processes that rely on software models instead of actual software systems. In the context of software modeling, we have done so far some efforts to attach logic representation to programs written in any language by using an automatic and grammar-driven approach. However, XML proved to be a difﬁcult candidate for such an approach, because we discovered the XML format is already close to the logic format that we desired, and running our generic grammar-driven approach on XML ﬁles would add unnecessary complications. Therefore, we imagined a different technique for transforming XML ﬁles into logic models, a technique that preserves useful information already present in XML ﬁles. As a beneﬁt of the approach we will show how UML models (also described in XML) can be transformed into logic models and analyzed or transformed further at a logical level. I. I NTRODUCTION Representing programs as logic factbases (Prolog-like) has become over the years a strong approach to support processes like program analysis and program transformation ([Ciu99], [TM03]). Analyses and transformations of programs can be described in a much more expressive manner than that of an imperative language. The idea of logic representation for programs has led to the development of ProGen (PROlog GENerator), a fully automatic tool capable of constructing logic representations for programs written in any programming language ([CJM08], [JCM08]). ProGen is equipped with a grammar repository for several programming languages and the process of building a logic representation for a program is actually driven by the grammar of that program. Thus, using the approach for different programming languages does not involve writing a new tool for each new language, but rather conﬁguring the same tool with another input grammar. In this paper we address the problem of building logic models for XML ﬁles. When we conﬁgured ProGen with the XML grammar and used it to parse XML ﬁles and build logic models for them we realized the result is in each case far more complicated than the initial XML ﬁle. Normally, an XML ﬁle uses a hierchical structure to present information: there are XML tags (parents) that encapsulate other XML tags (children). ProGen models also have a hierarchical structure, driven by the grammar of the language. Thus, parent entities (non-terminals of the grammar) are linked to child entities (non-terminals or terminals) as speciﬁed by the grammar. Each grammar rule explicitly deﬁnes links between several child entities (members of the right side of the rule) and a single parent entity (the left side of the rule). The two hierarchies have very few in common since ProGen describes hierarchies at the syntactic level while XML depicts them at the textual level. The need arised for ProGen to perform different on XML ﬁles, such that the textual hierarchy is considered instead of the syntactic hierarchy, thus alleviating the usage of the generated logic model. This will be shown next. XML is the de-facto standard for sharing information be- tween different applications. It is not a surprise that UML tools use XML as the preferred format to export their artifacts. Thus, if ProGen is used to translate such UML artifacts (described in XML) to the realm of logic, the analysis and transformation steps that we mentioned at the beginning of this section could very well be used on them. Thus, analysis and transformation of UML models is achieved. This is the main reason why we even considered XML for this discussion. Analysis of UML models is not a very common research topic and there are reasons for that. Analyzing models has the great disadvantage of not being able to grasp all the infor- mation about the system being analyzed, because models are only abstractions of the system. Thus, model-based analyses can’t expect to be more successful than code-based analyses ([Mar04], [SSL01], [Ciu99], [TM03], [Jeb04], [JLB02]). Still, model-based analyses have one major advantage. They can be applied early in the development phase and eventual problems could be solved before the coding phase even begins, thus saving time and money. This is the reason why we believe there is some potential in performing analyses directly on UML models, even if the analyses are limited in power or depth. The next sections are structured as follows: section II presents an overview of the logic based representations per- formed by ProGen, section III shows how certain modeling tools like ArgoUML or Enterprise Architect describe UML models using XML and also how ProGen should be adapted to better beneﬁt from the information already present in the XML format, section IV describes how the generated ProGen model can be used to perform analysis on UML models (even if XML was the modeled language), and section V concludes. II. OVERVIEW OF LOGIC BASED REPRESENTATIONS In this section we will brieﬂy present how ProGen generates its logic models for programs written in any language. We