International Journal of Scientific and Research Publications, Volume 6, Issue 9, September 2016 455 ISSN 2250-3153 www.ijsrp.org Time Series Analysis of Google, Bing, Yahoo! & Baidu Using Simple Keyword “Plagiarism” Peerzada Mohammad Iqbal 1 , Dr. Abdul Majid Baba 2 , Aasim Bashir 3 1 Professional Assistant, Sher-e-Kashmir University of Agricultural Sciences & Technology of Kashmir (SKUAST-K), India 2 Head, Department of Library and Information Science, The University of Kashmir, India 3 Assistant Professor, Department of Computer Science, The University of Kashmir, India Abstract- This paper provides a comprehensive research on time series analysis of four search engines viz., Google, Bing, Yahoo! and Baidu using simple keyword “Plagiarism” from the field of Library and Information Science. The time series analysis is used to forecast result fluctuation using series of data which was collected on daily basis for about 100 Days, to generate 50 days of projected data, and latter a trend line was used to compare the forecast of select search engines. The evaluation reveal that Bing shows a positive secular trend while Baidu, Yahoo! and Google show a downward or negative secular trend. Index Terms- Time series, Fluctuation, Plagiarism, Search engine, Result, Counter, Index. I. INTRODUCTION nformation on the web can be searched and access via search engines 1 , Depending on the usability and type of information needed Web has procured as an important source of information in research oriented society. The major activity performed on web is searching information for research purposes mainly via these engines 2,3 . However the results yielded for a number of queries rank in several thousand or even in millions due to the availability of infinite amount of information. However many studies show that only first few results are browsed by the users 4,5,6,7 , which determines the success of a search engine therefore result ranking holds utmost importance in this regard. Result ranking was merely based on term frequency and the inverse document frequency in case of classical IR system 8 .Various parameters are taken into account in Web search results ranking as number of links pointing to a given web page 9,10 , the anchor text of the links pointing to the web page, the placement of the search terms in the document (terms occurring in title or header may get a higher weight), the distance between the search terms, popularity of the page (in terms of the number of times it is visited), the text appearing in metatags 11 , subject specific authority of the web page 12,13 , recently in search index and exactness of the hits 14 . There is always an ongoing competition between search engines and Web page authors for users and high ranking respectively, which is why the algorithm ranking are kept a secret by the search engine companies as Google states 10 , "Due to the nature of our business and our interest in protecting the integrity of our search results, this is the only information we make available to the public about our ranking system". Apart from this search engines keep on updating and upgrading their algorithm so to improve their ranking of results. Nowadays search engine optimization industries are present which design and redesign Web pages in order to enhance their rankings within a specific search engine (e.g., search engine optimization Inc., www.seoine.com/). Therefore in the crux it can be concluded that the First ten results retrieved for a query have major chances of being visited by the users. In addition to the examination of changes overtime for the top ten results related to a query of the largest search engine, which at the times of first data collection were Google, yahoo and Tacoma (MSN search came out if beta on Feb 1 st 2005 in the midst of data collection for the second round 15 . However various transformations between the user's "visceral need" (a fuzzy view of the information problem in user's mind) and the "compromised need" (the way the query is phrased taking into account the limitations of the search tool at hand) 16 . Above all the fluctuation of a result related to a query can only be judged by the user while some researchers claim that it is impractical due to the presence of a large number of documents related to a query and all of them can't be viewed by the user, hence for checking fluctuation a panel of judges is required 17,18 . II. PROBLEM In early days the internet was simple, restricted and direct. With help of some command driven software finding information was limited and not user friendly. The advent of many types of search engines provided solution for literature search using Boolean operators, Proximity searching, Wild cards, Truncation etc. Many search engines developed new versions and techniques to achieve some kind of sophistication but all have not helped to forward the case of access and searching from scholar’s perspective. Besides keeping in view different ways of indexing the internet, search engines operate in different ways and retrieve documents in different orders. Further, it does not sift information from scholar’s point of view i.e., it retrieves information on a particular topic from different aspects like marketing, advertisement, news and entertainment mixed with some research papers. The academic community attempts to look purely for scholarly information on his topic of interest to have output/ retrieval best in terms of comprehensiveness and devoid of fluctuations etc. The present investigation attempts to evaluate the performance of the select search engines in terms of result fluctuation captured in two phases to check the consistency of search engines. I