An Improved Combined Content-similarity Approach for Optimizing Web Query Disambiguation Shahid Kamal 1 , Roliana Ibrahim 1* and Imran Ghani 1 Faculty of Computing, Universiti Teknologi Malaysia (UTM), 81310 skudai, Johor Malaysia [Email: 1 skamaltipu@gmail.com, 2 roliana@utm.my, 3 imran@utm.my] * Corresponding author: Roliana Ibrahim ABSTRACT The web search engines are exposed to the issue of uncertainty because of ambiguous queries, being input for retrieving the accurate results. Ambiguous queries constitute a significant fraction of such instances and pose real challenges to web search engines. Moreover, web search has created an interest for the researchers to deal with search by considering context in terms of location perspective. Our proposed disambiguation approach is designed to improve user experience by using context in terms of location relevance with the document relevance. The aim is that providing the user a comprehensive location perspective of a topic is informative than retrieving a result that only contains temporal or context information. The capacity to use this information in a location manner can be, from a user perspective, potentially useful for several tasks, including user query understanding or clustering based on location. In order to carry out the approach, we developed a Java based prototype to derive the contextual information from the web results based on the queries from the well-known datasets. Among those results, queries are further classified in order to perform search in a broad way. After the result provision to users and the selection made by them, feedback is recorded implicitly to improve the web search based on contextual information. The experiment results demonstrate the outstanding performance of our approach in terms of precision 75%, accuracy 73%; recall 81% and f-measure 78% when compared with generic temporal evaluation approach and furthermore achieved precision 86%, accuracy 71%; recall 67% and f-measure 75% when compared with web document clustering approach. Keyword: Content similarity, query disambiguation, web search, location, temporal information 1. INTRODUCTION In recent times, web search optimization has become a very active research area among professionals from both the industry and the academia involved in information retrieval and web search [1]. This is because it is most likely that Internet and search engines have become essential gears in our daily life. Regardless of marvelous improvements being made to optimize the web search over the last decade, still much has to be done to deal with the ever-increasing size of the web and needs of the users. A collection of search results that correspond to a search query are retrieved by conventional search engines. Among these search results, some may lead a user to those internet resources that are different to his/her interests, even though having similarity with the search query. Often this situation arises when search queries are related to more than one topic, some or all of which being of little or no interest to the user, in which case the search results are produced that are descriptive of each of the different topics [2]. The search result acquirement process (See Figure 1) begins with a query defined by a user. Based on this query, a document search is conducted in different data sources for example Yahoo, Google etc. in general, between 50-200 results from traditional search engines are collected containing a minimum URL, a snippet and a title and returned back to the user [3]. In pursuit of web search optimization, the Temporal Information Retrieval (T-IR) has gained greater attention in recent years. However, majority of these solutions either focus on development of suitable tools or perform behavioral analysis based on log data. Significant numbers of user search queries have strong temporal components or characteristics. These are the queries whose underlying intent may be to obtain newest information, past or anticipated events and largely depend on time [4]. Context is an important source of information in computing environments. The term context is defined by the authors of [5] as “any information that can be used to characterize the situation of an entity”. Figure 1: Conventional Search Process An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves. Based on the information being defined as context, the query disambiguation can be greatly improved by applying contextual information[1]. In essence, we focus on disambiguating a text query with respect to its contextual as well as temporal purpose and propose a combined approach that takes into account the contextual and temporal information exist in the search results. Furthermore, we have introduced a different concept of utilizing implicit user feedback based on the selection frequency for the purpose of refining the search results 1