Interdisciplinary Journal of Information, Knowledge, and Management Volume 5, 2010 Editor: Eli Cohen Discovering Interesting Association Rules in the Web Log Usage Data Maja Dimitrijević The Advanced Technical School, Novi Sad, Serbia dimitrijevic@vtsns.edu.rs Zita Bošnjak The University of Novi Sad, Faculty of Economics Subotica, Serbia bzita@eccf.su.ac.yu Abstract The immense volume of web usage data that exists on web servers contains potentially valuable information about the behavior of website visitors. This information can be exploited in various ways, such as enhancing the effectiveness of websites or developing directed web marketing campaigns. In this paper we will focus on applying association rules as a data mining technique to extract potentially useful knowledge from web usage data. We conducted a comprehensive analysis of web usage association rules found on a website of an educational institution. Our experiments confirm that, prior to pruning, the set of generated asso- ciation rules contained too many non-interesting rules, which made it very difficult for a user to find and exploit useful information. Many of these rules are a simple consequence of the high correlation between web pages due to their interconnectedness through the website link structure. We proposed and applied a set of basic pruning schemes to reduce the rule set size and to remove a significant number of non-interesting rules. This pruning method decreased the size of our ex- perimental rule set by more than three times, making it much simpler to browse for truly interest- ing rules. The percentage of truly interesting rules, which can initiate a webmaster to actions that can potentially enhance the website and improve its browsing experience, in our resulting ex- perimental rule set was 41%. The analysis of association rules in our case study confirmed the hypothesis that discovering in- teresting and potentially useful association rules in web usage data does not have to be a time- consuming task and can lead to actions that increase the website’s effectiveness. Keywords: association rules, web usage data, pruning, interestingness measures, website link structure Introduction Due to the immense volume of Internet usage and web browsing in recent years, log files generated by web servers con- tain enormous amounts of web usage data that is potentially valuable for un- derstanding the behaviour of website visitors. This knowledge can be applied in various ways, such as enhancing the Material published as part of this publication, either on-line or in print, is copyrighted by the Informing Science Institute. Permission to make digital or paper copy of part or all of these works for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage AND that copies 1) bear this notice in full and 2) give the full citation on the first page. It is per- missible to abstract these works so long as credit is given. To copy in all other cases or to republish or to post on a server or to redistribute to lists requires specific permission and payment of a fee. Contact Publisher@InformingScience.org to request redistribution permission.