Identifying Citation Sentiment and its Inﬂuence while Indexing Scientiﬁc Papers Souvick Ghosh School of Communication and Information Rutgers University souvick.ghosh@rutgers.edu Chirag Shah Information School University of Washington chirags@uw.edu Abstract Sentiment analysis has proven to be a popular research area for analyzing social media texts, newspaper articles, and product reviews. However, sentiment analysis of citation instances is a relatively unexplored area of research. For scientiﬁc papers, it is often assumed that the sentiment associated with citation instances is inherently positive. This assumption is due to the hedged nature of sentiment in citations, which is difﬁcult to identify and classify. As a result, most of the existing indexes focus only on the frequency of citation. In this paper, we highlight the importance of considering sentiment of citation while preparing ranking indexes for scientiﬁc literature. We perform automatic sentiment classiﬁcation of citation instances on the ACL Anthology collection of papers. Next, we use the sentiment score in addition to the frequency of citation to build a ranking index for this collection of scientiﬁc papers. By using various baselines, we highlight the impact of our index on the ACL Anthology collection of papers. Our research contributes toward building more sentiment sensitive ranking index which better underlines the inﬂuence and usefulness of research papers. 1. Introduction Our work toward developing a sentiment-sensitive ranking index for scientiﬁc papers can be situated at the intersection of bibliometrics, real-world citation networks, and sentiment analysis. A graphical representation of a hypothetical citation network has been presented in Figure 1. Each node of the graph represents a scientiﬁc paper in the collection. In scientiﬁc papers, we could ﬁnd mentions of other papers. These mentions, called citations, reﬂect the view of the author (of the source paper) towards the target paper. We can visualize these instances of citation as directed edges which originate from the source or citing paper and point to the target or cited the paper. Previous studies [1, 2, 3, 4] have revealed that citation networks exhibit the properties of the small-world network with high clustering coefﬁcient and small degrees of separation. This highlights that a lot of citations are observed within a closed community and as such the criticisms are often expressed in polite terms. Figure 1: Example of Citation Network. The lifecycle of most research projects begins with a concept or an idea and ends with a publication in a conference, journal, or any other suitable venue. If one explores the collection of scientiﬁc papers in a given ﬁeld or research area, one could identify a directed network between the papers and the authors as they cite each other in their respective works. Investigating this network of scientiﬁc citations has been the focus of research in computer and information science. Looking at the ensemble of papers, we could identify the relative importance of the papers, the authors, and the ideas expressed in the papers. We could also identify how the different entities – papers, authors, and ideas – are connected to each other in the network of citations. It would allow the researchers to identify the most inﬂuential papers in the network and their degree of inﬂuence on the other papers [1]. In some cases, the absence of citations could serve as a signiﬁcant clue Proceedings of the 53rd Hawaii International Conference on System Sciences | 2020 Page 2517 URI: https://hdl.handle.net/10125/64049 978-0-9981331-3-3 (CC BY-NC-ND 4.0)