www.ijcrt.org © 2018 IJCRT | Volume 6, Issue 2 April 2018 | ISSN: 2320-2882 IJCRT1892047 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org 306 GSA: A GLOBAL FRAMEWORK FOR SIMILARITY SEARCHING Anima Srivastava 1 , Manish Jaiswal 2 , Arpita Tewari 3 1 Department of Electronics & Communication, University of Allahabad, Allahabad, India 2 Department of Electronics & Communication, University of Allahabad, Allahabad, India 3 Department of Electronics & Communication, University of Allahabad, Allahabad, India Abstract : With the advancement in technology searching and machine learning is believed to a good technique for measuring documents similarity and prediction accuracy for plagiarism detection. The most popular searching algorithm is either the industrial or the academic environment is RankBrain algorithm. This paper proposed an improved framework of searching with machine learning which masters the complexity of searching accurate matches. An empirical evaluation of the proposed approach is given based on its objective and case study. We describe a novel functional framework based on searching algorithm with machine learning both for differentiating intent of query and generate content semantically. We explore and analyze various well-known Google’s searching algorithm in terms of their effectiveness toward similarity searching and best matching. Index Terms - searching, document similarity, RankBrain I. INTRODUCTION Machine learning algorithms [16] are one of the powerful techniques to measure similarity of documents by versatile methods. This paper is carrying the actions of different similarity based machine learning algorithm but emphasizes on RankBrain i.e. the new way to design and find improved search ranking and quality. This work shows a transparent comparative study of similarity detection having its efficiency and deficiency in complete manner with analysis. The rest of the paper is organized as follows; Section 1 contains the introductory explanations of the work, Section 2 describes the brief knowledge of the several prominent contemporary searching algorithms, whereas the section 3 highlights the literature review of related searching [11] aspects and algorithms, section 4 stated clearly the detail of proposed framework; section 5 measuring the efficiency and applicability of the proposed framework; finally the section 6 includes the conclusion. II. SEARCHING ALGORITHMS FOR SIMILARITY DETECTION Each searching algorithm [6] has multiple parameters and searching criteria to detect similarity and retrieve optimum outcomes. Some of most popular Google’s searching algorithms [7] are discussed in the following: 2.1Panda Panda [7] is a searching algorithm used to assign grade for web pages which is based on subject’s quality and also settle on down rank of websites with their quality content. Panda works like a strainer instead d of Google’s other searching algorithm. Basically it is integrated into the ranking algorithm and used for de-rank sites with low quality content, it doesn’t utilize in real time search but filtering and retrieving results from updated version of Panda is much more faster than the older one. 2.2Pigeon Pigeon is Google’s searching algorithm released with the two key factors i.e. distance and location. Pigeon is available for searches result in English only. The query is based on searcher’s location because it significantly drops in the number of queries used to rank local and non local result returned. It uses local directory sites for providing excellent result. Goggle map and Google web search consistently used by pigeon for relevant local search results. 2.3 Penguin The main objective of penguin [7] is to detect and de-rank sites with unsolicited, anomalous link outlines. By using devious tactics it operates in real time hence correction and revival takes less time penguin is just a segment of Google’s main ranking algorithm. 2.4 Pirate Googles pirate was invented inhibit and de-rank those sites that have many copyright encroachment reports. Nowadays popularly know sites are involved in making plagiarize content e.g. video clips, songs, movies etc. for