Mok XTFDs:FDs for XML Documents Proceedings of the Eleventh Americas Conference on Information Systems, Omaha, NE, USA August 11 th -14 th 2005 XTFDs: FDs for XML Documents Wai Yin Mok, Ph.D. 1 University of Alabama in Huntsville mokw@uah.edu ABSTRACT Functional Dependencies (FDs) are a common constraint for many applications. Specifying FDs in XML documents, however, is difficult because XML documents do not have uniform structures. We introduce XML Template Functional Dependencies (XTFDs), which can specify FDs in XML documents, in this paper. Previously, we defined XTFDs in terms of variables only (Mok, 2005). In this paper, we extend our previous work by incorporating XPath expressions. By incorporating XPath expressions, we can express more FDs in XML documents. Since XTFDs are based on simple concepts like variables, functions, and XPath expressions, XTFDs are more intuitive than other proposals of the same purpose in the literature. We are currently comparing XTFDs with the approaches in Arenas & Libkin (2004); Vincent et al. (2004). A preliminary comparison shows that XTFDs improve the approaches in Arenas & Libkin (2004); Vincent et al. (2004) if recursive structures, mixed content and optional attributes are allowed. Keywords XML Template Functional Dependencies, Functional Dependencies, Template Dependencies, XPath expressions. INTRODUCTION This paper introduces XML Template Functional Dependencies (XTFDs)—a new constraint for XML documents. XTFDs are inspired by Template Dependencies (TDs) (Sadri & Ullman, 1982) and XPath expressions (W3C, 1999). To provide a brief overview, XTFDs are constructed from variables and XPath expressions and the semantics of XTFDs are based on predicate calculus. Since XTFDs are built on simple concepts like variables, functions, and XPath expressions,XTFDs are a highly intuitive constraint. Here, we first point out the notation and the basic assumptions of this paper. Element names, element instances, attribute names, and text strings all appear in typewriter style. Person, <Person Name="Esau">, Name, "Jacob" and "A plain man" in Figure 2a are examples. As in Arenas & Libkin (2004); Fan & Libkin (2002); Vincent et al. (2004), we assume every element instance in an XML document is distinct. For example, all Bar elements in Figure 3 are unique. Even though <Bar C="4">text4</Bar> and <Bar C="4">text4</Bar> appear to be the same in Figure 3, they are actually different Bar elements which happen to contain the same text string "text4" and the same attribute value "4". A text string, on the other hand, is identified by its value. Thus, all occurrences of a text string in an XML document are considered to be identical. Although the text string "1" (which is also an attribute value), appear several times in Figure 3, they are all identical. In a sense, we can think of elements as containers that contain other data values such as text strings, numbers and attribute values. Hence, it is possible that two different containers contain the same data values. TDS AND XPATH EXPRESSIONS To demonstrate some sample TDs, let R = ABC be a relation scheme. Figure 1(a) shows a TD that denotes the multivalued dependency A ĺĺ B | C. The two rows above the horizontal line constitute the hypothesis of the TD and the row below the horizontal line is the conclusion. Figure 1(a) means for any relation r over R, if there is a function ij such that ij maps each a i to an A-value in r, each b i to an B-value in r, each c i to an C-value in r, and (ij(a 1 ), ij(b 1 ), ij(c 1 )) ∈ r and (ij(a 1 ), ij(b 2 ), ij(c 2 )) ∈ r, then (ij(a 1 ), ij(b 1 ), ij(c 2 )) ∈ r as well. Figure 1(b) shows the FD A ĺ B. The meaning of its hypothesis is the same as that of A ĺĺ B | C. Its conclusion, however, means ij(b 1 ) = ij(b 2 ). We assume the reader is familiar with basic XPath expressions up to the level of Carey (2004). Here, we demonstrate some XPath expressions that are particularly useful for this research. As an example, we show an incomplete Abraham’s family tree in Figure 2(a), an XSLT style sheet that uses several XPath expressions in Figure 2(b), and the result document of 1 W.Y. Mok was supported in part by the Richard A. Witmondt Faculty Fellowship and a UAH Research Mini-Grant. 3054