53
Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Chapter 3
DOI: 10.4018/978-1-61350-356-0.ch003
INTRODUCTION
In recent years, the number of various multi-
dimensional data generated and distributed in
various information sources as well as the num-
ber of users that use these information sources
has been increasing. These sources usually use
different models for the representation of data,
such as the relational model, semistructured
models on the web, text files, etc. For efficient
data management and exchange, XML has been
increasing its relevance as a fundamental standard.
As the widespread use of XML for describing and
exchanging data on the web is increasing, XML
based comparison becomes a central issue in the
database and information retrieval. The use of
XML similarity in a wide range of applications
such as data integration, change management,
classification/clustering of XML documents
and XML querying is needed (Tekli, Chbeir, &
Yetongnon, 2009).
Sanjay Kumar Madria
Missouri University of Science and Technology, USA
Waraporn Viyanon
Missouri University of Science and Technology, USA
XML Similarity Detection
and Measures
ABSTRACT
XML similarity detection plays an important role in facilitating many applications such as data integra-
tion, document classifcation/clustering, querying, and change management. In this chapter, we present
an overview on XML document syntactic and semantic similarity/distance measures along with existing
research related to XML similarity detection. The measures are classifed into two main categories:
structural similarity, and structural and content similarity. We review similarity detection approaches
proposed in the literature and discuss some of the challenges and future directions for research on XML
similarity detection and related felds.