Developing XML Documents with Guaranteed “Good” Properties D.W. Embley Department of Computer Science Brigham Young University Provo, Utah 84602 USA embley@cs.byu.edu W.Y. Mok Department of Business Information Systems and Education Utah State University Logan, Utah 84322 USA wmok@cc.usu.edu Abstract Many XML documents are being produced, but there are no agreed-upon standards formally definingwhatitmeansforcomplyingXMLdocumentstohave“good”properties. Inthispaper we present a formal definition for a proposed canonical normal form for XML documents called XNF . XNF guarantees that complying XML documents have maximally compact connectiv- ity while simultaneously guaranteeing that the data in complying XML documents cannot be redundant. Further,wepresentaconceptual-model-basedmethodologythatautomaticallygen- eratesXNF-compliantDTDsandprovethatthealgorithms,whicharepartofthemethodology, produce DTDs to ensure that all complying XML documents satisfy the properties of XNF. 1 Introduction Many DTDs (Document Type Definitions) for XML documents are being produced (e.g. see [XML]), and soon many XML-Schema specifications [XML00] for XML documents will be pro- duced. With the emergence of these documents, we should be asking the question, “What con- stitutes a good DTD?” 1 We argue that a “good” DTD should guarantee that all complying XML documents are in an agreed-upon form that has two desirable properties: (1) the DTD should have as few hierarchical trees as possible rooted just below the top-level node, and (2) at the same time, the DTD should not allow any of these trees to have redundant data values in XML documents that comply with the DTD. Intuitively, this should ensure that complying XML documents are 1 Since we do not address issues beyond hierarchical structure in this document, we discuss the issues in terms of DTDs rather than XML-Schemas. Further, since proposed specifications require XML-Schema to include full DTD expressibility, we do so without loss of generality. 1