Volume 2, No. 4, July-August 2011 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info © 2010, IJARCS All Rights Reserved 311 Analyses of Algorithms and Complexity for Secure Association Rule Mining of Distributed Level Hierarchy in Web Gulshan Shrivastava* Department of Computer Science & Engineering, Ambedkar Institute of Technology, Geeta Colony, Delhi, India gulshanstv@gmail.com Dr. Vishal Bhatnagar Department of Computer Science & Engineering, Ambedkar Institute of Technology, Geeta Colony, Delhi, India vishalbhatnagar@yahoo.com Abstract— WWW (World Wide Web) has revolutionized the way in which people interact, carry out their works and gather information. It has proved itself to be a useful interface for its users to carry out such activities with ease. With hundreds of millions of people around the world using it a huge pile of data are collected every day. These data carries interesting insights on the way people interact with it. Web Mining is the process of using various data mining techniques to analyze and discover patterns from the data. The Web mostly contains semi-structured information. It is, however, not easy to search and extract structural data hidden in a Web page. Thus several privacy preserving techniques for association rule mining for web have also been proposed in the past few years. This paper focuses on analyses of algorithms and complexity for secure association rule mining of distributed level hierarchy in web. We also have shown that algorithm’s pseudocode for easily analyzing its complexity. Keywords - Vertical Partition, Privacy Preserving, Complexity, Pseudocode of Association Rule Mining I. INTRODUCTION The rapid development of computer technology, especially increased capacities and decreased costs of storage media, has led businesses to store huge amounts of external and internal information in large databases at low cost. Mining useful information and helpful knowledge from these large databases has thus evolved into an important research area. Web mining [2] [10] the application of data mining techniques to web-based data for the purpose of learning or extracting knowledge. Web mining encompasses a wide variety technique, including soft computing [12]. Web mining methodologies can generally be classified into one of three distinct categories: web usage mining, web structure mining, and web content mining. In mathematics, computer science, and related subjects, an “algorithm” is an effective method for solving a problem expressed as a finite sequence of instructions. Algorithms are used for calculation, data processing, and many other fields. Each algorithm is a list of well-defined instructions for completing a task. Starting from an initial state, the instructions describe a computation that proceeds through a well-defined series of successive states, eventually terminating in a final ending state. The transition from one state to the next is not necessarily deterministic; some algorithms, known as randomized algorithms, incorporate randomness. Algorithms are written in pseudocode that resembles programming languages like C and Java etc. Pseudocode is a mixture of natural language and high level programming concept that describes the main idea behind a generic implementation of a data structure or algorithm. The rest of this paper is arranged as follows: Section 2 gives an overview about the background and related work in the area of secure association rule mining of distributed level hierarchy in web. In section 3 the details of analysis of algorithm for secure association rule mining of distributed level hierarchy in web. Section 4 results of our paper by analysis of complexity for secure association rule mining of distributed level hierarchy in web. Finally, some conclusion and prospect are put forward in Section 5. II. BACKGROUND & RELATED WORK Web usage mining, the art of analyzing user interactions with a web page, has been dealt by several researchers using different approaches [2]. Some researchers including [3], [6] have used classification algorithms for detecting web usage patterns. The authors [7] used similarity upper approximation clustering technique on web transactions from web log data to extract the behavior pattern of user’s page visits and order of occurrence of visits. Privacy preservation in data publishing has attracted considerable attention due to the need of several organizations to share their data without revealing information that can be traced to real person or legal entities. Privacy preservation was first studied in the relational context. In [15, 8] the authors introduce k-anonymity and use generalization and suppression as their two basic tools for anonymizing a dataset. [16] Proved that optimal k- anonymity for multidimensional QI is NP-hard, under both the generalization and suppression models. For the latter, they proposed an approximate algorithm that minimizes the number of suppressed values; the approximation bound is O(k · log k). [17] Improved this bound to O(k), while [18] further reduced it to O(log k). Incognito [18] and Mondrian [18] guarantee k-anonymity for a relation table by transforming the original data using global (full-domain) and local recoding respectively. In [5] the authors provide another local recoding approach that shows superior performance to the global recoding approach of Incognito. A different approach is taken in [14], where the authors propose to use natural domain generalization hierarchies (as opposed to user-defined ones) to reduce information loss.