On guarded simulations and acyclic first-order languages George H.L. Fletcher Eindhoven University of Technology The Netherlands g.h.l.fletcher@tue.nl Jan Hidders Delft University of Technology The Netherlands a.j.h.hidders@tudelft.nl Stijn Vansummeren Universit ´ e Libre de Bruxelles Belgium stijn.vansummeren@ulb.ac.be Yongming Luo Eindhoven University of Technology The Netherlands y.luo@tue.nl Franc ¸ois Picalausa Universit ´ e Libre de Bruxelles Belgium fpicalau@ulb.ac.be Paul De Bra Eindhoven University of Technology The Netherlands debra@tue.nl ABSTRACT An exact structural characterization of the expressive power of the acyclic conjunctive queries is given in terms of guarded simulations. The study of this fragment of first order logic is motivated by the central role it plays in query languages across a wide range of data models. The study of a struc- tural characterization of the language is motivated by the applications of such characterizations, for example, in the design of efficient indexing and query processing strategies. In addition to a presentation of our main result, we dis- cuss the results of a small empirical study which indicate the practicality of guarded simulation based reductions of database instances. 1. INTRODUCTION The conjunctive queries were recognized early in the study of database query languages as a particularly important fragment of first-order logic (FO) [6]. As the basic language for expressing join patterns between database objects, the conjunctive queries have since continued to play a central role in query language design across all major datamodels: relational, complex object, object-oriented, semi-structured, XML, graph, and RDF data [1, 2]. For example, conjunc- tive queries appear in the guise of path and star queries, and tree and graph patterns in these various data models. Already in Chandra and Merlin’s first paper on the con- junctive queries, the notion of homomorphisms (i.e., struc- ture preserving mappings) was crucial in reasoning about the language. It is indeed well known that the conjunctive queries are invariant under homomorphisms [26], that is Theorem 1. For tuples a1 and a2 over constants appear- ing in respective database instances db1 and db2, if there ex- ists a homomorphism f from db1 to db2 such that f ( a1)= Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. This article was presented at: DBPL ’11. Copyright 2011. a2, then for every conjunctive query Q, if a1 Q(db1) then a2 Q(db2). Such structural characterizations of the expressive power of query languages play an important role, for example, in the study of indexing data structures to accelerate query processing (e.g., [4, 8, 11, 14, 17, 18, 23, 28]). To be us- able, however, these characterizations must be efficiently computable and maintainable under database updates. Clearly, computing and maintaining all homomorphisms becomes impractical as the size of the database grows. There are, however, useful fragments of FO which have tractable structural characterizations. Indeed, many FO path lan- guages for trees and graphs (e.g., [4, 8, 9, 17, 23, 29]) are characterized by variants of (bi)simulation, tractable struc- tural notions of equivalence which have deep roots both in- side and outside of computer science research [27]. In the logic community, the so-called guarded fragment of FO was shown to be characterized by a tractable generalized notion of guarded bisimulation [3, 24]. Flum et al. have since es- tablished expressive equivalence between guarded FO and the acyclic fragment of FO [10], and Leinders et al. have shown that guarded FO corresponds to the semi-join vari- ant of Codd’s relational algebra [20]. In the context of the conjunctive queries, is it possible to isolate a useful fragment which similarly admits a tractable structural characterization? Gottlob et al. have established the expressive equivalence of the acyclic conjunctive queries and the conjunctive fragment of guarded FO [13]. Clearly, this is a very natural candidate to consider (e.g., such queries appear in the role of tree patterns for XML and path/star patterns for RDF) [10, 12]. To our knowledge, a structural characterization of the acyclic conjunctive queries has not been established; the closest results here are those for pos- itive modal languages established in the 1990’s by de Rijke et al. [5, 19] and more recently those of Wu et al. for positive path queries on trees [29]. Contributions and overview. In this paper, we give the first structural characterization of the expressive power of the acyclic conjunctive queries in terms of guarded simu- lations, thereby complementing the structural characteriza- tion of acyclic FO in terms of guarded bisimulations. Sim- ulations, which are efficiently computable, have previously found basic applications in data management (e.g., [1, 7,