ORIGINAL RESEARCH ARTICLE
UNCOVERING USER’S SEARCH PATTERNS TO PERSONALISE WEB SEARCH
Smita Sankhe and
*
Nirmala Shinde
Department of Computer Engineering, K. J. Somaiya College of Engineering, Mumbai, India
ARTICLE INFO ABSTRACT
In today’s world, search engines have become a very convenient method of searching and
retrieving information. But this increasing use of search engines goes hand in hand with the ever-
increasing data available on the internet. With such large number of websites available, it is
essential to have these websites sorted in decreasing order of their relevance to the user’s query
for effective operation and retrieval of data. This paper explores various domains related to
Computer Science and proposes a framework that seems the best fix to this problem. We have
proposed a new system to provide personalized web search according to the user’s internet surfing
patterns. The system extracts the user’s history and scrapes the web pages’ content (title,
keywords, headings, sub-headings, meta tags). These documents are then clustered using
Word2Vec model and Latent Semantic Indexing to give better results. User’s search query is
mapped to the profile and an appropriate cluster is selected. The SERP returned by the search
engine is mapped to the selected cluster to find the similarity index. A linear regression model is
used to assign the final score which takes the regency, frequency, popularity and user’s feedback
along with the similarity measure to re-rank the SERP.
Copyright © 2018, Smita Sankhe and Nirmala Shinde. This is an open access article distributed under the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
INTRODUCTION
The user profile is created using the user’s browser history file
and other typical behaviors on the internet, such as the
bookmarked web pages, accounts created on various portals,
subscriptions, etc. All these attributes can be used to construct
certain patterns for this profile, which will be employed for
effective re-ranking of SERPs for future queries. We also
make an effort to date this user profile, that is, the more recent
entries and activities hold a higher significant value and vice
versa. This is because human interests and behaviors tend to
change over periods of time. This period of time can be neither
quantified nor generalized. Hence, the method of dating entries
ensures that recent activities hold more relevance than the
previous activities, which led us to achieve better results when
tested again other results. Some users can be new to the
browser, others can have a full-fledged browser history,
sufficient to analyse its history files to identify patterns and
trends.
*Corresponding author: Nirmala Shinde,
Department of Computer Engineering, K. J. Somaiya College of
Engineering, Mumbai, India.
User profiling was found to be effective only for the
experienced/regular users. For new users, using information
from browser history files is not feasible. Hence, we made
provisions for new users by taking inputs about user’s interests
in a separate form that consists of keywords (interest topics)
and accordingly tailoring the SERP to provide effective
personalization. Gradually, as the user uses the browser, the
history file will record his activities and once this history file
possesses information exceeding a certain threshold, the
program will switch to providing dynamic personalization of
web search. The model also uses more than one algorithm for
many tasks, to ensure that features and details of all attributes
are correctly captured and addressed by the model and are used
effectively to improve its accuracy and performance. Apart
from the model, certain novel modules and functionalities have
also been incorporated into our project which increases the
personalization factor and successfully reduce the user’s time
for information retrieval. This problem can certainly appear to
be small and negligible at the microeconomic level, but this is
not the case. For most organizations, the employees tend to
make use of Search Engines on a daily basis for a wide range
of tasks.
ISSN: 2230-9926 International Journal of Development Research
Vol. 08, Issue, 06, pp.21074-21080, June, 2018
Article History:
Received 09
th
March, 2018
Received in revised form
20
th
April, 2018
Accepted 18
th
May, 2018
Published online 30
th
June, 2018
Available online at http://www.journalijdr.com
Key Words:
Data mining, Hierarchical clustering,
Machine learning, Natural language
processing, Search methods,
User Modeling, Web mining.
Citation: Smita Sankhe and Nirmala Shinde. 2018. “Uncovering user’s search patterns to personalise web search”, International Journal of Development
Research, 8, (06), 21074-21080.
ORIGINAL RESEARCH ARTICLE OPEN ACCESS