FOCAS: Penalising friendly citations to improve author ranking Jorge Silva CRACS/INESC TEC & University of Porto Porto, Portugal jorge.m.silva@inesctec.pt David Aparício CRACS/INESC TEC & University of Porto Porto, Portugal daparicio@dcc.fc.up.pt Pedro Ribeiro CRACS/INESC TEC & University of Porto Porto, Portugal pribeiro@dcc.fc.up.pt Fernando Silva CRACS/INESC TEC & University of Porto Porto, Portugal fmsilva@dcc.fc.up.pt ABSTRACT Scientifc impact is commonly associated with the number of cita- tions received. However, an author can easily boost his own citation count by (i) publishing articles that cite his own previous work (self-citations), (ii) having co-authors citing his work (co-author cita- tions), or (iii) exchanging citations with authors from other research groups (reciprocated citations). Even though these friendly citations infate an author’s perceived scientifc impact, author ranking al- gorithms do not normally address them. They, at most, remove self-citations. Here we present Friends-Only Citations AnalySer (FOCAS), a method that identifes friendly citations and reduces their negative efect in author ranking algorithms. FOCAS com- bines the author citation network with the co-authorship network in order to measure author proximity and penalises citations be- tween friendly authors. FOCAS is general and can be regarded as an independent module applied while running (any) PageRank-like au- thor ranking algorithm. FOCAS can be tuned to use three diferent criteria, namely authors’ distance, citation frequency, and citation recency, or combinations of these. We evaluate and compare FO- CAS against eight state-of-the-art author ranking algorithms. We compare their rankings with a ground-truth of best paper awards. We test our hypothesis on a citation and co-authorship network comprised of seven Information Retrieval top-conferences. We ob- served that FOCAS improved author rankings by 25% on average and, in one case, leads to a gain of 46%. CCS CONCEPTS · Human-centered computing → Social network analysis; · Computing methodologies → Ranking; KEYWORDS Author ranking, self-citations, friendly citations, citation networks, co-authorship networks Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. SAC ’20, March 30-April 3, 2020, Brno, Czech Republic © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-6866-7/20/03. . . $15.00 https://doi.org/10.1145/3341105.3373991 ACM Reference Format: Jorge Silva, David Aparício, Pedro Ribeiro, and Fernando Silva. 2020. FO- CAS: Penalising friendly citations to improve author ranking. In The 35th ACM/SIGAPP Symposium on Applied Computing (SAC ’20), March 30-April 3, 2020, Brno, Czech Republic. ACM, New York, NY, USA, 9 pages. https: //doi.org/10.1145/3341105.3373991 1 INTRODUCTION Deciding where (or to whom) to allocate research funding is a problem that afects all scientists directly. This is typically done by attempting to assess the impact of a scientist, that is, to deter- mine how much of his research work has contributed to advance his scientifc feld. The impact of scientists is also commonly used to pick scientifc committees, attribute research grants, or decide fac- ulty promotions. These processes are not fully automated and are traditionally done by peers. However, bibliometrics can be of help since they provide an unbiased estimator of scientifc impact. For example, the h-index [5] counts the number of publications that a scientist (or author) has with more than h citations (e.g., an author has h-index = 7 if he has 7 papers with at least 7 citations). Many variations of the h-index have been proposed [8, 10] but the h-index remains widely used. Another common approach to evaluate an author’s impact is to use graph metrics on citation networks. Computing graph metrics is computationally more expensive than calculating bibliometrics, but has some advantages, namely (i) they give credit for indirect citations (i.e., if A cites B, and B cites C, C receives part of the credit of the citation of A to B), and (ii) they measure the author’s impact at a group scale, that is the impact of each author depends on the impact of the authors that cite him. PageRank [9] is the most widely used graph algorithm to measure author’s impact, and many variations have been proposed specifcally for author ranking [2, 3, 11, 14, 16, 19]. One of PageRank’s major algorithmic ideas is that nodes are not all equal, i.e., in its original context of hyperlinks, it is good that any webpage points at yours but it is better that important webpages point at yours. This idea naturally extends to author citation networks, meaning that it is good to be cited by any author but it is better to be cited by important authors. Regardless of the metric used to evaluate scientifc impact (e.g., bibliometrics or graph metrics), citations are important and several works study how an author can increase his number of citations. Undoubtedly the quality of the author’s work is correlated with his number of citations [17]. However, other factors such as the au- thor’s co-authorship network [12] and his social behaviour [4, 15]