Implicitly Learning a User Interest Profile for
Personalization of Web Search using Collaborative
Filtering
Ashish Nanda
Dept. of Computer Science
BITS-Pilani, Goa Campus
Zuarinagar, India
Email: f2010175@goa.bits-pilani.ac.in
Rohit Omanwar
Dept. of Computer Science
BITS-Pilani, Goa Campus
Zuarinagar, India
Email: h2012060@goa.bits-pilani.ac.in
Bharat Deshpande
Head of Dept. of Computer Science
BITS-Pilani, Goa Campus
Zuarinagar, India
Email: bmd@goa.bits-pilani.ac.in
Abstract—The increasing abundance of content on the web has
made information filtering even more important in helping users
find information related to their interests. Personalization of web
search is one such effort, that aims at improving the efficiency
with which a user finds results relevant to his query. This is done
by keeping track of a user’s individual interests, and taking it into
account while returning search results. We propose a robust user
modeling technique that implicitly creates a Dynamic Category
Interest Tree (DCIT), using a general ontology of the web and a
set of web pages collected over time that give an insight into a
user’s interests. The DCIT is designed to use a fuzzy classification
technique to keep track of what topics a user is interested in,
his amount of interest in a topic, as well as reflect his changing
interests overtime. The DCIT consists of a general ontology of the
web, where each node represents a topic and consists of keywords
that are usually used to describe that topic or category. Additional
keywords that the user frequently associates with a topic, such
as names of important people, organizations, or a specialized
terminology, etc. are also incorporated into the relevant topic.
We use the Apriori Algorithm to extract these associated words
from the user’s web history in order to more accurately define
the user’s categories of interest. The DCIT is initially created by
a content based approach using only the browsing history of the
user, and is later further enhanced through collaborative filtering
using the k-nearest neighbour-based algorithm. We propose a
technique to re-rank the results from a search engine according
to their relevance to a user, based on his implicitly learned DCIT.
According to experimental results, our DCIT based ranking often
outperforms search engines such as Google when it comes to
retrieving web pages that are more relevant to a user’s interest.
Keywords—personalized web search; ranking; user profile;
implicit user interest
I. I NTRODUCTION
The World Wide Web is a great source of information for
millions of users and has content spanning almost all topics
at various abstraction levels. While this allows it to serve as
a huge information resource, the diversity and sheer volume
of available information often makes it difficult for users with
different and specific interests who having varying levels of
proficiency in each topic and require different levels of detail
for the same, to find the web pages most relevant to them at
any point of time. Search engines and several web applications
are often built to serve all users in the same way with little
or no adaptation to the user’s profile, namely their interests,
preferences, and past behavior while using the application.
Thus if a technology enthusiast would type the word “Apple”
in a search engine, he would possibly expect results of the
popular technology company Apple Inc., while a farmer would
possibly be more interested in results pertaining to the fruit.
Therefore in order to tackle the problem of recommending
the relevant results to a user based on their interest profile
across various topics, we have proposed a user modeling
technique that creates a Dynamic Category Interest Tree, with
the most general topics at the top of the tree, and with more
specific subtopics at deeper levels in the tree. The Dynamic
Category Interest Tree is designed to not only take into account
the different topics a user is interested in and their general as
well as user specific features and terms, but also reflect the
changing interest of the user over time. The user profile is
first created through content based personalization by keeping
track of a user’s browsing patterns, and is later enriched further
through collaborative filtering. We use this user profiling
technique to filter the search results returned by Google for
a user’s query, and re-rank the results based on the user’s
profile. The text of a page is compared to the user’s profile
and based on several ranking parameters a score is calculated,
which is used to reorder the search results. The main merit of
this technique is that, a fuzzy classification is employed while
scoring topics of interest for a user, that changes with time
as a user’s interests change, and hence the re-ranked results
reflect both the long term and short term interests of a user.
For e.g., when a user types the query “latest sports news”
and is interested in “tennis” and “football”, news related to
these sports will be ranked the highest. However if a user is
reading web pages on “Operating Systems” in the first half of
a year, then a query like “upcoming IEEE conferences” would
return conferences in topics related to Operating Systems as the
top ranked results, while in the next few months if the user’s
interest changes to “Machine learning”, the same query will
now rank pages related to conferences on Machine Learning
higher.
II. RELATED WORK
A. Search Personalization
Personalized web search was first proposed by Page et
al. [1] by using a modified page rank algorithm which took
2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)
978-1-4799-4143-8/14 $31.00 © 2014 IEEE
DOI 10.1109/WI-IAT.2014.80
54
2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)
978-1-4799-4143-8/14 $31.00 © 2014 IEEE
DOI 10.1109/WI-IAT.2014.80
54