Implicitly Learning a User Interest Proﬁle for Personalization of Web Search using Collaborative Filtering Ashish Nanda Dept. of Computer Science BITS-Pilani, Goa Campus Zuarinagar, India Email: f2010175@goa.bits-pilani.ac.in Rohit Omanwar Dept. of Computer Science BITS-Pilani, Goa Campus Zuarinagar, India Email: h2012060@goa.bits-pilani.ac.in Bharat Deshpande Head of Dept. of Computer Science BITS-Pilani, Goa Campus Zuarinagar, India Email: bmd@goa.bits-pilani.ac.in Abstract—The increasing abundance of content on the web has made information ﬁltering even more important in helping users ﬁnd information related to their interests. Personalization of web search is one such effort, that aims at improving the efﬁciency with which a user ﬁnds results relevant to his query. This is done by keeping track of a user’s individual interests, and taking it into account while returning search results. We propose a robust user modeling technique that implicitly creates a Dynamic Category Interest Tree (DCIT), using a general ontology of the web and a set of web pages collected over time that give an insight into a user’s interests. The DCIT is designed to use a fuzzy classiﬁcation technique to keep track of what topics a user is interested in, his amount of interest in a topic, as well as reﬂect his changing interests overtime. The DCIT consists of a general ontology of the web, where each node represents a topic and consists of keywords that are usually used to describe that topic or category. Additional keywords that the user frequently associates with a topic, such as names of important people, organizations, or a specialized terminology, etc. are also incorporated into the relevant topic. We use the Apriori Algorithm to extract these associated words from the user’s web history in order to more accurately deﬁne the user’s categories of interest. The DCIT is initially created by a content based approach using only the browsing history of the user, and is later further enhanced through collaborative ﬁltering using the k-nearest neighbour-based algorithm. We propose a technique to re-rank the results from a search engine according to their relevance to a user, based on his implicitly learned DCIT. According to experimental results, our DCIT based ranking often outperforms search engines such as Google when it comes to retrieving web pages that are more relevant to a user’s interest. Keywords—personalized web search; ranking; user proﬁle; implicit user interest I. I NTRODUCTION The World Wide Web is a great source of information for millions of users and has content spanning almost all topics at various abstraction levels. While this allows it to serve as a huge information resource, the diversity and sheer volume of available information often makes it difﬁcult for users with different and speciﬁc interests who having varying levels of proﬁciency in each topic and require different levels of detail for the same, to ﬁnd the web pages most relevant to them at any point of time. Search engines and several web applications are often built to serve all users in the same way with little or no adaptation to the user’s proﬁle, namely their interests, preferences, and past behavior while using the application. Thus if a technology enthusiast would type the word “Apple” in a search engine, he would possibly expect results of the popular technology company Apple Inc., while a farmer would possibly be more interested in results pertaining to the fruit. Therefore in order to tackle the problem of recommending the relevant results to a user based on their interest proﬁle across various topics, we have proposed a user modeling technique that creates a Dynamic Category Interest Tree, with the most general topics at the top of the tree, and with more speciﬁc subtopics at deeper levels in the tree. The Dynamic Category Interest Tree is designed to not only take into account the different topics a user is interested in and their general as well as user speciﬁc features and terms, but also reﬂect the changing interest of the user over time. The user proﬁle is ﬁrst created through content based personalization by keeping track of a user’s browsing patterns, and is later enriched further through collaborative ﬁltering. We use this user proﬁling technique to ﬁlter the search results returned by Google for a user’s query, and re-rank the results based on the user’s proﬁle. The text of a page is compared to the user’s proﬁle and based on several ranking parameters a score is calculated, which is used to reorder the search results. The main merit of this technique is that, a fuzzy classiﬁcation is employed while scoring topics of interest for a user, that changes with time as a user’s interests change, and hence the re-ranked results reﬂect both the long term and short term interests of a user. For e.g., when a user types the query “latest sports news” and is interested in “tennis” and “football”, news related to these sports will be ranked the highest. However if a user is reading web pages on “Operating Systems” in the ﬁrst half of a year, then a query like “upcoming IEEE conferences” would return conferences in topics related to Operating Systems as the top ranked results, while in the next few months if the user’s interest changes to “Machine learning”, the same query will now rank pages related to conferences on Machine Learning higher. II. RELATED WORK A. Search Personalization Personalized web search was ﬁrst proposed by Page et al. [1] by using a modiﬁed page rank algorithm which took 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) 978-1-4799-4143-8/14 $31.00 © 2014 IEEE DOI 10.1109/WI-IAT.2014.80 54 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) 978-1-4799-4143-8/14 $31.00 © 2014 IEEE DOI 10.1109/WI-IAT.2014.80 54