International Journal of Web Technology Volume: 01 No: 01 January – June 2012 Integrated Intelligent Research (IIR) 29 HIERARCHICAL FREQUENT PATTERN ANALYSIS OF WEB LOGS FOR EFFICIENT INTERESTINGNESS PREDICTION G. Sudhamathy Department Of Computer Applications, Velammal College of Engineering & Technology, Madurai 625 009, India sudhamathi10@hotmail.com Dr. C. Jothi Venkateswaran Department of Computer Science, Presidency College (Autonomous), Chennai 600 025, India, jothivenkateswaran@yahoo.co.in Abstract In this paper, we propose an efficient approach for frequent pattern mining using web logs for web usage mining and we call this approach as HFPA. In our approach HFPA, the proposed technique is applied to mine association rules from web logs using normal Apriori algorithm, but with few adaptations for improving the interestingness of the rules produced and for applicability for web usage mining. We applied this technique and compared its performance with that of classical Apriori-mined rules. The results indicate that the proposed approach HFPA not only generates far fewer rules than Apriori-based algorithms (FPA), the generated rules are also of comparable quality with respect to three objective performance measures, Confidence, Lift and Conviction. Association mining often produces large collections of association rules that are difficult to understand and put into action. In this paper we have proposed effective pruning techniques that are characterized by the natural web link structures. Our experiments showed that interestingness measures can successfully be used to sort the discovered association rules after the pruning method was applied. Most of the rules that ranked highly according to the interestingness measures proved to be truly valuable to a web site administrator. Keywords-Web Usage Mining, Web Logs, Association Rules, Interestingness Measures I. INTRODUCTION Originally, association rule mining algorithms were applied for Market Basket Analysis which contained transaction data. The transaction data may include many records of which each record has a transaction id and a list of items purchased during that transaction. But when the same Apriori algorithm has to be applied for web log data, it has to be transformed to the same format as that of the transactions. To make this happen, the web log data has to be cleaned, split and preprocessed into sessions and the list of web pages navigated during each session. Once this data transformation is done, association rules can be mined as we do for market basket analysis. However, the threshold selection, pruning method, interesting measures used and ranking of the rules needs some modifications to suit the needs of web usage mining. The association rule mining algorithm can find all rules that satisfy defined constraints, they often result in a large set of rules that is difficult to exploit and find those rules that are truly interesting to the user. Web log data differs from the market basket data in the sense that it contains a large number of tightly correlated web pages due to the link structure of a website. Web pages that are tightly linked together often occur in the same transaction, which is why the generated set of association rules are high and they have very high confidence, but are not truly interesting to the user.