Maintaining Knowledge-Bases of Navigational Patterns from Streams of Navigational Sequences Ajumobi Udechukwu, Ken Barker, Reda Alhajj ADSA Lab, Dept. of Computer Science, University of Calgary, Canada {ajumobiu, barker, alhajj}@cpsc.ucalgary.ca Abstract In this paper we explore an alternative design goal for navigational pattern discovery in stream environments. Instead of mining based on thresholds and returning the patterns that satisfy the specified threshold(s), we propose to mine without thresholds and return all identified patterns along with their support counts in a single pass. We utilize a sliding window to capture recent navigational sequences and propose a batch-update strategy for maintaining the patterns within a sliding window. Our batch-update strategy depends on the ability to efficiently mine the navigational patterns without support thresholds. To achieve this, we have designed an efficient algorithm for mining contiguous navigational patterns without support thresholds. Our experiments show that our algorithm outperforms the existing techniques for mining contiguous navigational patterns. Our experiments also show that the proposed batch-update strategy achieves considerable speed-ups compared to the existing window update strategy, which requires total re-computation of patterns within each new window. Keywords: Data streams, navigational patterns, web- usage mining. 1. Introduction An interesting problem in web usage mining that has attracted the attention of several researchers is the discovery of traversal patterns (or link navigation patterns) of web users [1]. Tracking user-browsing habits provides useful information for service providers and businesses, and ultimately should help to improve the effectiveness of the service provided. In popular e-commerce sites, the web logs receive continuous streams of entries. For these web sites to improve their performance by utilizing discovered navigational patterns, the navigational sequences should be treated as data streams. In this work, we propose a framework for mining and updating contiguous navigational patterns from streams of navigational sequences. We utilize a sliding window to capture the most recent set of navigational patterns. The class of patterns identified from streaming web log sequences in this work is contiguous navigational patterns. We assume pre-processing steps are applied to the web logs [2] before they are added to the stream of navigational sequences. Generally, there are two broad techniques for mining navigational patterns – level-wise, apriori-based techniques [1]; and tree-based techniques [5, 7]. The apriori-based algorithms are derived from early algorithms for mining sequential patterns and association rules. These algorithms are level-wise and utilize candidate generation and test techniques so it is possible to define various test conditions for candidate patterns before they are included in the result-set. These algorithms can be used to discover different types of navigational patterns by adjusting the test conditions, so they can be used for mining generalized navigational patterns and for constrained navigational patterns. For example, given the navigational sequence (A, C, K, M, O, R) representing objects (or web pages) visited by a user in order. A generalized navigational pattern would include “A, M, R” as a valid pattern because the objects are visited in order. The test condition may also be set to constrain gaps between successive objects. If the maximum allowable gap between two consecutive objects in a pattern is 1, then pattern “A, M, R” becomes invalid because the distance between ‘A’ and ‘M’ is 2. When such constraints are incorporated in the mining process the result is known as a constrained navigational pattern. A special case of constrained navigational patterns is contiguous navigational patterns, which results when the allowable gap between successive objects is set to 0 whereby no gap is allowed. The apriori-based algorithms can discover every type of navigational pattern discussed above. Proceedings of the 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA’05) 1097-8585/05 $20.00 © 2005 IEEE