Google Scholar’s Ranking Algorithm: The Impact of Articles’ Age (An Empirical Study) Joeran Beel & Bela Gipp Otto-von-Guericke University Department of Computer Science ITI / VLBA-Lab / Scienstein Magdeburg, Germany j.beel|b.gipp@scienstein.org Abstract Google Scholar is one of the major academic search engines but its ranking algorithm for academic articles is unknown. In recent studies we partly reverse-engineered the algorithm. This paper presents the results of our third study. While the first study provided a broad overview and the second study focused on researching the impact of citation counts, the current study focused on analyzing the correlation of an article’s age and its ranking in Google Scholar. In other words, it was analyzed if older/recent published articles are more/less likely to appear in a top position in Google Scholar ’s result lists. For our study, age and rankings of 1,099,749 articles retrieved via 2,100 search queries were analyzed. The analysis revealed that an article’s age seems to play no significant role in Google Scholar’s ranking algorithm. It is also discussed why this might lead to a suboptimal ranking. 1. Introduction With increasing use of academic search engines it becomes increasingly important for scientific authors that their research articles are well ranked in those search engines in order to reach their audience. To optimize research papers for academic search engines, such as Google Scholar or Scienstein.org, knowledge about ranking algorithms is essential. For instance, if search engines consider how often a search term occurs in an article’s full text, authors should use the most relevant keywords in their articles whenever possible to achieve a top ranking. For users of academic search engines, knowledge about applied ranking algorithms is also essential for two reasons. Firstly, users should know about the algorithms in order to estimate the search engine’s robustness to manipulation attempts by authors and spammers and therefore the trustworthiness of the results. Secondly, knowledge of ranking algorithms enables researchers to estimate the usefulness of results in respect to their search intention. For instance, researchers interested in the latest trends should use a search engine putting high weight on the publications’ date. Users searching for standard literature should choose a search engine putting high weight on citation counts. In contrast, if a user searches for articles from authors advancing a view different from the majority, search engines putting high weight on citation counts might not be appropriate. googlexxxfods Therefore, this paper deals with the question of how Google Scholar ranks its results. The paper is structured as follows. In the second section related work about Google Scholar’s ranking algorithm is presented. The third section covers the research objectives while the fourth section explains the utilized methodology. Finally, the results and their interpretation follow. 2. Related Work Due to different user needs, many academic databases and search engines enable the user to choose a ranking algorithm. For instance, ScienceDirect lets Joeran Beel and Bela Gipp. Google Scholar’s Ranking Algorithm: The Impact of Articles’ Age (An Empirical Study). In Shahram Latifi, editor, Proceedings of the 6th International Conference on Information Technology: New Generations (ITNG’09), pages 160–164, Las Vegas (USA), April 2009. IEEE. doi: 10.1109/ITNG.2009.317. ISBN 978-1424437702. Downloaded from www.docear.org