Learning and Predicting Key Web Navigation Patterns Using Bayesian Models Malik Tahir Hassan, Khurum Nazir Junejo, and Asim Karim Dept. of Computer Science LUMS School of Science and Engineering Lahore, Pakistan {mhassan, junejo, akarim}@lums.edu.pk Abstract. The accurate prediction of Web navigation patterns has immense com- mercial value as the Web evolves into a primary medium for marketing and sales for many businesses. Often these predictions are based on complex tem- poral models of users’ behavior learned from historical data. Such an approach, however, is not readily understandable by business people and hence less likely to be used. In this paper, we consider several key and practical Web navigation patterns and present Bayesian models for their learning and prediction. The nav- igation patterns considered include pages (or page categories) visited in first N positions, type of visit (short or long), and rank of page categories visited in first N positions. The patterns are learned and predicted for specific users, time slots, and user-time slot combinations. We employ Bayes rule and Markov chain in our learning and prediction models. The focus is on accuracy and simplicity rather than modeling the complex Web user behavior. We evaluate our models on four weeks of Web navigation data. Prediction models are learned from the first three weeks of data and the predictions are tested on last week’s data. The results confirm the high accuracy and good efficiency of our models. 1 Introduction Significant patterns do exist in Web navigation data [1]. Learning and predicting such patterns has immense commercial value as the Web evolves into a primary medium for marketing and sales for many businesses [2]. Web-based businesses seek useful users’ patterns to help identify promising events, potential risks, and to undertake customer relations management. Similarly, such businesses seek useful temporal and global pat- terns to help them optimize their business processes and system operations. Web surfer behavior modeling and Web navigation pattern discovery has been a popular research topic. Over the years, numerous approaches have been proposed for solving various aspects of this problem with varying degrees of success. In general, the problem in- volves prediction of the sequence of page views based on the previous history of such sequences. To simplify the problem somewhat, Web pages are often abstracted and grouped into categories and the problem is reduced to the prediction of the sequence of page categories visited. Nonetheless, this is a complex machine learning problem that requires careful consideration from the technical and practical points of view. Among the various approaches used for the modeling of Web navigation patterns, probabilistic approaches have been very common [3,4,5,6,7]. Borges and Levene [3] O. Gervasi et al. (Eds.): ICCSA 2009, Part II, LNCS 5593, pp. 877–887, 2009. c Springer-Verlag Berlin Heidelberg 2009