© 2018, IJCERT All Rights Reserved DOI: http://dx.doi.org/10.22362/ijcert/2018/v5/i1/V5I103 12
International Journal of Computer Engineering in Research Trends
Multidisciplinary, Open Access, Peer-Reviewed and fully refereed
Research Paper Volume-5, Issue-1 ,2018 Regular Edition E-ISSN: 2349-7084
Investigation of Mining Association Rules on XML
Document
P.M. Gavali
1
1*
Computer Science and Engineering, D.K.T.E. Society’s Textile and Engineering Institute, Ichalkranji , India
e-mail: gavalipm87@gmail.com
Available online at: http://www.ijcert.org
Received: 10/01./2018, Revised: 17/01/2018, Accepted: 18/January/2018, Published: 02/February /2018
Abstract:- XML is globally accepted format for sending the data on internet and between different applications
which are running on different platforms and architectures. Due to this, the huge amount of data on the internet is
in XML. Thus researchers are attracted toward XML to identify interesting findings and patterns from these
documents. Many data mining algorithms have been applied to XML including clustering, classification and
association rules. In this paper association, rule mining on XML document is studied. This can be used to identify
what work is done in the stated field and how we can extend it further in future.
Keywords: XML, Data Mining, Association rules.
-------------------------------------------------------------------------------------------------------------------------------------------------------
1. Introduction
Nowadays XML (eXtensible Markup Language) is
widely used format for exchanging data on the internet as it
is portable. Therefore most of the data available on the
internet are in XML. So it became essential to process such
XML documents to identify hidden and useful information
from XML. In this direction, most of worked is carried out
by applying various data mining algorithms. In this paper, we
are concentrating on mining association rule on XML
document. Rest of the paper is organized as: In point two,
various techniques to identify association rule on XML used
by researchers are discussed. In point three, work of various
researchers is explained in detail. In point, four conclusions
are given.
2. Broad Categories of Mining
Association Rule on XML
document
Researchers used different techniques to mine
association rules from different categories. Mining
association rule on XML document is broadly divided into
four groups as shown in figure 1.
Figure 1 Broad Categories of ARM on XML Document
The tree-based approach creates a tree from XML document
to simplify the identification of association rules. On this
tree, basic tasks can be performed like identifying frequent
itemsets and providing the abstract layer between original
XML document and querying system. While structure-based
method considers the structure of original XML document
and their relationship with other elements in XML document
to form rule. It can also be used to track the changes made in
the XML document.
Some of the researchers used the variants of the table for
identifying transactions from the XML document which
formed the base for determining frequent items identification
and identifying association rule. While parsers are used to
check the quality of underlying data across DTDs and XML