Identifying Clusters of User Behavior in Intranet
Search Engine Log Files
Dick Stenmark
IT University of Gothenburg, Department of Applied IT, S-41296 Gothenburg, Sweden.
E-mail:dick.stenmark@ituniv.se
When studying how ordinary Web users interact with
Web search engines, researchers tend to either treat the
users as a homogeneous group or group them accord-
ing to search experience. Neither approach is sufficient,
we argue, to capture the variety in behavior that is known
to exist among searchers. By applying automatic cluster-
ing technique based on self-organizing maps to search
engine log files from a corporate intranet, we show that
users can be usefully separated into distinguishable seg-
ments based on their actual search behavior. Based on
these segments, future tools for information seeking and
retrieval can be targeted to specific segments rather
than just made to fit the “the average user.” The exact
number of clusters, and to some extent their character-
istics, can be expected to vary between intranets, but
our results indicate that some more generic groups may
exist. In our study, a large group of users appeared to be
“fact seekers” who would benefit from higher precision,
a smaller group of users were more holistically oriented
and would likely benefit from higher recall, and a third
category of users seemed to constitute the knowledge-
able users.These three groups may raise different design
implications for search-tool developers.
Introduction
In this article, we discuss whether users of a Web-based
information-seeking tool should be understood and analyzed
as individuals with unique requirements and preferences
or seen as contributors to a collective behavior that may
be described using mean values and averages. We argue,
although both extremes have their merits, that too often has
the user been bundled with thousands of others at the expense
of finer details and deeper understanding. At the same time,
the analysis of thousands of individuals would be extremely
resource consuming while results based on the examination
of a handful could easily be biased. We therefore suggest a
way where Web search engine users’ similarities in seeking
Received August 28, 2007; revised June 18, 2008; accepted June 18, 2008
© 2008 ASIS&T • Published online 31 July 2008 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/asi.20931
behavior are used to form clusters of users that thereafter can
be analyzed in depth.
A decent amount of research on how ordinary Web
users interact with public search engines such as AltaVista
(Silverstein, Henzinger, Marais, & Moricz, 1998), EXCITE
(Jansen, Spink, Bateman, & Saracevic, 1998), or Alltheweb
(Jansen & Spink, 2003) has been carried out over the last
decade. Automatically generated log files from these sys-
tems have been studied and have generated useful statistics
on the amount of time typically spent with the search tools,
the average query length, the mean number of result pages
requested, the use of advanced features and Boolean opera-
tors (or the lack thereof), and these studies have allowed us
to notice emerging trends in user behavior. We hence begin
to know a few things about the average search engine user;
however, as Cooper (1999) argued, there is no such thing as
a typical user. It must be assumed that people who search for
information have different levels of experience and education
as well as diversified and personalized information needs,
and thus behave very differently. To only look at the aver-
age numbers would mask the diversity and richness that exist
in search behavior, we argue. Another common approach is
thus to divide users in a priori defined groups, most notably
in experts versus novices (Moore, Erdelez, & He, 2007). This
is problematic since these concepts are far from well defined
and are based on the researchers’ assumptions that there are
both experienced and novice users out there and that the level
of search experience should affect Web search behavior. We
instead suggest that one should look more openly at the users’
real behavior and use clustering techniques to identify and
analyze the groups that naturally emerge from such an activ-
ity, as previously done by Chen and Cooper (2001). Doing
so avoids the “average user syndrome” and also allows us
to study behavior without being biased by expectations or
assumptions.
The general understanding of a cluster seems to be that it
is a group of objects whose members are more similar to each
other than to the members of any other group, and clustering
is thus the process of organizing object into groups based on
JOURNALOFTHEAMERICANSOCIETYFORINFORMATIONSCIENCEANDTECHNOLOGY,59(14):2232–2243,2008