Volume 2, No. 4, July-August 2011
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at www.ijarcs.info
© 2010, IJARCS All Rights Reserved 311
Analyses of Algorithms and Complexity for Secure Association Rule Mining of
Distributed Level Hierarchy in Web
Gulshan Shrivastava*
Department of Computer Science & Engineering,
Ambedkar Institute of Technology,
Geeta Colony, Delhi, India
gulshanstv@gmail.com
Dr. Vishal Bhatnagar
Department of Computer Science & Engineering,
Ambedkar Institute of Technology,
Geeta Colony, Delhi, India
vishalbhatnagar@yahoo.com
Abstract— WWW (World Wide Web) has revolutionized the way in which people interact, carry out their works and gather information. It has
proved itself to be a useful interface for its users to carry out such activities with ease. With hundreds of millions of people around the world
using it a huge pile of data are collected every day. These data carries interesting insights on the way people interact with it. Web Mining is the
process of using various data mining techniques to analyze and discover patterns from the data. The Web mostly contains semi-structured
information. It is, however, not easy to search and extract structural data hidden in a Web page. Thus several privacy preserving techniques for
association rule mining for web have also been proposed in the past few years. This paper focuses on analyses of algorithms and complexity for
secure association rule mining of distributed level hierarchy in web. We also have shown that algorithm’s pseudocode for easily analyzing its
complexity.
Keywords - Vertical Partition, Privacy Preserving, Complexity, Pseudocode of Association Rule Mining
I. INTRODUCTION
The rapid development of computer technology,
especially increased capacities and decreased costs of
storage media, has led businesses to store huge amounts of
external and internal information in large databases at low
cost. Mining useful information and helpful knowledge from
these large databases has thus evolved into an important
research area. Web mining [2] [10] the application of data
mining techniques to web-based data for the purpose of
learning or extracting knowledge. Web mining encompasses
a wide variety technique, including soft computing [12].
Web mining methodologies can generally be classified into
one of three distinct categories: web usage mining, web
structure mining, and web content mining.
In mathematics, computer science, and related subjects,
an “algorithm” is an effective method for solving a problem
expressed as a finite sequence of instructions. Algorithms
are used for calculation, data processing, and many other
fields. Each algorithm is a list of well-defined instructions
for completing a task. Starting from an initial state, the
instructions describe a computation that proceeds through a
well-defined series of successive states, eventually
terminating in a final ending state. The transition from one
state to the next is not necessarily deterministic; some
algorithms, known as randomized algorithms, incorporate
randomness. Algorithms are written in pseudocode that
resembles programming languages like C and Java etc.
Pseudocode is a mixture of natural language and high level
programming concept that describes the main idea behind a
generic implementation of a data structure or algorithm.
The rest of this paper is arranged as follows: Section 2
gives an overview about the background and related work in
the area of secure association rule mining of distributed
level hierarchy in web. In section 3 the details of analysis of
algorithm for secure association rule mining of distributed
level hierarchy in web. Section 4 results of our paper by
analysis of complexity for secure association rule mining of
distributed level hierarchy in web. Finally, some conclusion
and prospect are put forward in Section 5.
II. BACKGROUND & RELATED WORK
Web usage mining, the art of analyzing user
interactions with a web page, has been dealt by several
researchers using different approaches [2]. Some researchers
including [3], [6] have used classification algorithms for
detecting web usage patterns. The authors [7] used similarity
upper approximation clustering technique on web
transactions from web log data to extract the behavior
pattern of user’s page visits and order of occurrence of
visits.
Privacy preservation in data publishing has attracted
considerable attention due to the need of several
organizations to share their data without revealing
information that can be traced to real person or legal entities.
Privacy preservation was first studied in the relational
context. In [15, 8] the authors introduce k-anonymity and
use generalization and suppression as their two basic tools
for anonymizing a dataset. [16] Proved that optimal k-
anonymity for multidimensional QI is NP-hard, under both
the generalization and suppression models. For the latter,
they proposed an approximate algorithm that minimizes the
number of suppressed values; the approximation bound is
O(k · log k). [17] Improved this bound to O(k), while [18]
further reduced it to O(log k). Incognito [18] and Mondrian
[18] guarantee k-anonymity for a relation table by
transforming the original data using global (full-domain)
and local recoding respectively. In [5] the authors provide
another local recoding approach that shows superior
performance to the global recoding approach of Incognito. A
different approach is taken in [14], where the authors
propose to use natural domain generalization hierarchies (as
opposed to user-defined ones) to reduce information loss.