Identifying Clusters of User Behavior in Intranet Search Engine Log Files Dick Stenmark IT University of Gothenburg, Department of Applied IT, S-41296 Gothenburg, Sweden. E-mail:dick.stenmark@ituniv.se When studying how ordinary Web users interact with Web search engines, researchers tend to either treat the users as a homogeneous group or group them accord- ing to search experience. Neither approach is sufficient, we argue, to capture the variety in behavior that is known to exist among searchers. By applying automatic cluster- ing technique based on self-organizing maps to search engine log files from a corporate intranet, we show that users can be usefully separated into distinguishable seg- ments based on their actual search behavior. Based on these segments, future tools for information seeking and retrieval can be targeted to specific segments rather than just made to fit the “the average user.” The exact number of clusters, and to some extent their character- istics, can be expected to vary between intranets, but our results indicate that some more generic groups may exist. In our study, a large group of users appeared to be “fact seekers” who would benefit from higher precision, a smaller group of users were more holistically oriented and would likely benefit from higher recall, and a third category of users seemed to constitute the knowledge- able users.These three groups may raise different design implications for search-tool developers. Introduction In this article, we discuss whether users of a Web-based information-seeking tool should be understood and analyzed as individuals with unique requirements and preferences or seen as contributors to a collective behavior that may be described using mean values and averages. We argue, although both extremes have their merits, that too often has the user been bundled with thousands of others at the expense of finer details and deeper understanding. At the same time, the analysis of thousands of individuals would be extremely resource consuming while results based on the examination of a handful could easily be biased. We therefore suggest a way where Web search engine users’ similarities in seeking Received August 28, 2007; revised June 18, 2008; accepted June 18, 2008 © 2008 ASIS&T Published online 31 July 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/asi.20931 behavior are used to form clusters of users that thereafter can be analyzed in depth. A decent amount of research on how ordinary Web users interact with public search engines such as AltaVista (Silverstein, Henzinger, Marais, & Moricz, 1998), EXCITE (Jansen, Spink, Bateman, & Saracevic, 1998), or Alltheweb (Jansen & Spink, 2003) has been carried out over the last decade. Automatically generated log files from these sys- tems have been studied and have generated useful statistics on the amount of time typically spent with the search tools, the average query length, the mean number of result pages requested, the use of advanced features and Boolean opera- tors (or the lack thereof), and these studies have allowed us to notice emerging trends in user behavior. We hence begin to know a few things about the average search engine user; however, as Cooper (1999) argued, there is no such thing as a typical user. It must be assumed that people who search for information have different levels of experience and education as well as diversified and personalized information needs, and thus behave very differently. To only look at the aver- age numbers would mask the diversity and richness that exist in search behavior, we argue. Another common approach is thus to divide users in a priori defined groups, most notably in experts versus novices (Moore, Erdelez, & He, 2007). This is problematic since these concepts are far from well defined and are based on the researchers’ assumptions that there are both experienced and novice users out there and that the level of search experience should affect Web search behavior. We instead suggest that one should look more openly at the users’ real behavior and use clustering techniques to identify and analyze the groups that naturally emerge from such an activ- ity, as previously done by Chen and Cooper (2001). Doing so avoids the “average user syndrome” and also allows us to study behavior without being biased by expectations or assumptions. The general understanding of a cluster seems to be that it is a group of objects whose members are more similar to each other than to the members of any other group, and clustering is thus the process of organizing object into groups based on JOURNALOFTHEAMERICANSOCIETYFORINFORMATIONSCIENCEANDTECHNOLOGY,59(14):2232–2243,2008