Maintaining Knowledge-Bases of Navigational Patterns from Streams of
Navigational Sequences
Ajumobi Udechukwu, Ken Barker, Reda Alhajj
ADSA Lab, Dept. of Computer Science, University of Calgary, Canada
{ajumobiu, barker, alhajj}@cpsc.ucalgary.ca
Abstract
In this paper we explore an alternative design goal
for navigational pattern discovery in stream
environments. Instead of mining based on thresholds
and returning the patterns that satisfy the specified
threshold(s), we propose to mine without thresholds
and return all identified patterns along with their
support counts in a single pass. We utilize a sliding
window to capture recent navigational sequences and
propose a batch-update strategy for maintaining the
patterns within a sliding window. Our batch-update
strategy depends on the ability to efficiently mine the
navigational patterns without support thresholds. To
achieve this, we have designed an efficient algorithm
for mining contiguous navigational patterns without
support thresholds. Our experiments show that our
algorithm outperforms the existing techniques for
mining contiguous navigational patterns. Our
experiments also show that the proposed batch-update
strategy achieves considerable speed-ups compared to
the existing window update strategy, which requires
total re-computation of patterns within each new
window.
Keywords: Data streams, navigational patterns, web-
usage mining.
1. Introduction
An interesting problem in web usage mining that
has attracted the attention of several researchers is the
discovery of traversal patterns (or link navigation
patterns) of web users [1]. Tracking user-browsing
habits provides useful information for service
providers and businesses, and ultimately should help to
improve the effectiveness of the service provided. In
popular e-commerce sites, the web logs receive
continuous streams of entries. For these web sites to
improve their performance by utilizing discovered
navigational patterns, the navigational sequences
should be treated as data streams. In this work, we
propose a framework for mining and updating
contiguous navigational patterns from streams of
navigational sequences. We utilize a sliding window to
capture the most recent set of navigational patterns.
The class of patterns identified from streaming web
log sequences in this work is contiguous navigational
patterns. We assume pre-processing steps are applied
to the web logs [2] before they are added to the stream
of navigational sequences. Generally, there are two
broad techniques for mining navigational patterns –
level-wise, apriori-based techniques [1]; and tree-based
techniques [5, 7]. The apriori-based algorithms are
derived from early algorithms for mining sequential
patterns and association rules. These algorithms are
level-wise and utilize candidate generation and test
techniques so it is possible to define various test
conditions for candidate patterns before they are
included in the result-set. These algorithms can be
used to discover different types of navigational
patterns by adjusting the test conditions, so they can be
used for mining generalized navigational patterns and
for constrained navigational patterns. For example,
given the navigational sequence (A, C, K, M, O, R)
representing objects (or web pages) visited by a user in
order. A generalized navigational pattern would
include “A, M, R” as a valid pattern because the
objects are visited in order. The test condition may also
be set to constrain gaps between successive objects. If
the maximum allowable gap between two consecutive
objects in a pattern is 1, then pattern “A, M, R”
becomes invalid because the distance between ‘A’ and
‘M’ is 2. When such constraints are incorporated in the
mining process the result is known as a constrained
navigational pattern. A special case of constrained
navigational patterns is contiguous navigational
patterns, which results when the allowable gap
between successive objects is set to 0 whereby no gap
is allowed. The apriori-based algorithms can discover
every type of navigational pattern discussed above.
Proceedings of the 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA’05)
1097-8585/05 $20.00 © 2005 IEEE