ORIGINAL RESEARCH ARTICLE UNCOVERING USER’S SEARCH PATTERNS TO PERSONALISE WEB SEARCH Smita Sankhe and * Nirmala Shinde Department of Computer Engineering, K. J. Somaiya College of Engineering, Mumbai, India ARTICLE INFO ABSTRACT In today’s world, search engines have become a very convenient method of searching and retrieving information. But this increasing use of search engines goes hand in hand with the ever- increasing data available on the internet. With such large number of websites available, it is essential to have these websites sorted in decreasing order of their relevance to the user’s query for effective operation and retrieval of data. This paper explores various domains related to Computer Science and proposes a framework that seems the best fix to this problem. We have proposed a new system to provide personalized web search according to the user’s internet surfing patterns. The system extracts the user’s history and scrapes the web pages’ content (title, keywords, headings, sub-headings, meta tags). These documents are then clustered using Word2Vec model and Latent Semantic Indexing to give better results. User’s search query is mapped to the profile and an appropriate cluster is selected. The SERP returned by the search engine is mapped to the selected cluster to find the similarity index. A linear regression model is used to assign the final score which takes the regency, frequency, popularity and user’s feedback along with the similarity measure to re-rank the SERP. Copyright © 2018, Smita Sankhe and Nirmala Shinde. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. INTRODUCTION The user profile is created using the user’s browser history file and other typical behaviors on the internet, such as the bookmarked web pages, accounts created on various portals, subscriptions, etc. All these attributes can be used to construct certain patterns for this profile, which will be employed for effective re-ranking of SERPs for future queries. We also make an effort to date this user profile, that is, the more recent entries and activities hold a higher significant value and vice versa. This is because human interests and behaviors tend to change over periods of time. This period of time can be neither quantified nor generalized. Hence, the method of dating entries ensures that recent activities hold more relevance than the previous activities, which led us to achieve better results when tested again other results. Some users can be new to the browser, others can have a full-fledged browser history, sufficient to analyse its history files to identify patterns and trends. *Corresponding author: Nirmala Shinde, Department of Computer Engineering, K. J. Somaiya College of Engineering, Mumbai, India. User profiling was found to be effective only for the experienced/regular users. For new users, using information from browser history files is not feasible. Hence, we made provisions for new users by taking inputs about user’s interests in a separate form that consists of keywords (interest topics) and accordingly tailoring the SERP to provide effective personalization. Gradually, as the user uses the browser, the history file will record his activities and once this history file possesses information exceeding a certain threshold, the program will switch to providing dynamic personalization of web search. The model also uses more than one algorithm for many tasks, to ensure that features and details of all attributes are correctly captured and addressed by the model and are used effectively to improve its accuracy and performance. Apart from the model, certain novel modules and functionalities have also been incorporated into our project which increases the personalization factor and successfully reduce the user’s time for information retrieval. This problem can certainly appear to be small and negligible at the microeconomic level, but this is not the case. For most organizations, the employees tend to make use of Search Engines on a daily basis for a wide range of tasks. ISSN: 2230-9926 International Journal of Development Research Vol. 08, Issue, 06, pp.21074-21080, June, 2018 Article History: Received 09 th March, 2018 Received in revised form 20 th April, 2018 Accepted 18 th May, 2018 Published online 30 th June, 2018 Available online at http://www.journalijdr.com Key Words: Data mining, Hierarchical clustering, Machine learning, Natural language processing, Search methods, User Modeling, Web mining. Citation: Smita Sankhe and Nirmala Shinde. 2018. “Uncovering user’s search patterns to personalise web search”, International Journal of Development Research, 8, (06), 21074-21080. ORIGINAL RESEARCH ARTICLE OPEN ACCESS