Deriving Context Specific Information on the Web Christo Dichev Department of Computer Science, Winston-Salem State University Winston-Salem, N.C. 27110, USA dichevc@wssu.edu Darina Dicheva Department of Computer Science, Winston-Salem State University Winston-Salem, N.C. 27110, USA dichevad@wssu.edu Abstract: The Web is huge, unstructured and diverse in quality, which makes searching for information difficult. In practice, few of the documents returned by a search engine are valuable to a user. Which documents are valuable depends on the context of the query. Some adequate context information provided in addition to keywords can improve significantly search precision. In this paper we propose a framework for dynamic conceptual clustering of web documents based on clusters of users that share common interests. The basic assumption is that the search results would be more relevant to a user when provided within the context of semantically related documents marked as ‘interesting’ by a sufficiently large group of users with similar interests. This framework can support personalization of a search based on a search engine that ‘knows’ the context of the user information needs and uses it to tailor the search results. 1 Introduction The Web is huge and ubiquitous, unstructured, diverse in quality, dynamic and distributed, which makes searching for information principally difficult. General-purpose search engines that use keyword matching are notorious for returning too many matches of little relevance or quality in response to user queries. For example, if you submit the keyword “centroid” to Google almost 60,000 documents will be found. Which documents will be valuable to the user depends on the context of the query. The context depends on a number of factors, such as information related to the current request, user’s interests, background, education, present professional activities, hobbies, travel and entertainment habits, etc. Search engines, however, treat each request independently from previous requests of the same user and of other web users making similar requests. Therefore the ranked list of documents received in response to the same queries is typically the same and depends neither on the user nor on the context in which the query is made. Some adequate context information provided in addition to keywords can significantly improve the search results. The question is what type of context information is practical, how to infer that context information and how to use it for improving search results? Web users typically search for diverse information. Some searches are sporadic and irregular while others might be related to their interests and have more or less regular nature. An important question is then how to filter out these sporadic, irregular searches and how to combine regular searches into groups identifying topics of interest by observing the user behavior on the web. The fact that a user makes an isolated search for the size of Mars when solving a puzzle does not apparently indicate for any pattern of behavior while regular searches for papers on “Contextual reasoning“ are more stable because they identify the user’s current interests. Since the causal relations between the user’s interests and actions for resource discovery are more stable, the latter are more predictive of user’s future behavior. If we are able to identify topics of interest for a given user we can infer relevant contextual information associated with that user. Such contextual information when available to search engines could support personalized searches. Our approach to topic identification on the web is based on observations of the searching behavior of large groups of users. The intuition is that a topic of interest can be determined by identifying a collection of web documents that is of common interest to a sufficiently large group of web users. In the present paper we present a resource discovery framework based on a contextual topology. A problematic point in the original web architecture is that there is no explicit conceptual partitioning of the web