Identifying Citation Sentiment and its Influence while Indexing Scientific Papers Souvick Ghosh School of Communication and Information Rutgers University souvick.ghosh@rutgers.edu Chirag Shah Information School University of Washington chirags@uw.edu Abstract Sentiment analysis has proven to be a popular research area for analyzing social media texts, newspaper articles, and product reviews. However, sentiment analysis of citation instances is a relatively unexplored area of research. For scientific papers, it is often assumed that the sentiment associated with citation instances is inherently positive. This assumption is due to the hedged nature of sentiment in citations, which is difficult to identify and classify. As a result, most of the existing indexes focus only on the frequency of citation. In this paper, we highlight the importance of considering sentiment of citation while preparing ranking indexes for scientific literature. We perform automatic sentiment classification of citation instances on the ACL Anthology collection of papers. Next, we use the sentiment score in addition to the frequency of citation to build a ranking index for this collection of scientific papers. By using various baselines, we highlight the impact of our index on the ACL Anthology collection of papers. Our research contributes toward building more sentiment sensitive ranking index which better underlines the influence and usefulness of research papers. 1. Introduction Our work toward developing a sentiment-sensitive ranking index for scientific papers can be situated at the intersection of bibliometrics, real-world citation networks, and sentiment analysis. A graphical representation of a hypothetical citation network has been presented in Figure 1. Each node of the graph represents a scientific paper in the collection. In scientific papers, we could find mentions of other papers. These mentions, called citations, reflect the view of the author (of the source paper) towards the target paper. We can visualize these instances of citation as directed edges which originate from the source or citing paper and point to the target or cited the paper. Previous studies [1, 2, 3, 4] have revealed that citation networks exhibit the properties of the small-world network with high clustering coefficient and small degrees of separation. This highlights that a lot of citations are observed within a closed community and as such the criticisms are often expressed in polite terms. Figure 1: Example of Citation Network. The lifecycle of most research projects begins with a concept or an idea and ends with a publication in a conference, journal, or any other suitable venue. If one explores the collection of scientific papers in a given field or research area, one could identify a directed network between the papers and the authors as they cite each other in their respective works. Investigating this network of scientific citations has been the focus of research in computer and information science. Looking at the ensemble of papers, we could identify the relative importance of the papers, the authors, and the ideas expressed in the papers. We could also identify how the different entities – papers, authors, and ideas – are connected to each other in the network of citations. It would allow the researchers to identify the most influential papers in the network and their degree of influence on the other papers [1]. In some cases, the absence of citations could serve as a significant clue Proceedings of the 53rd Hawaii International Conference on System Sciences | 2020 Page 2517 URI: https://hdl.handle.net/10125/64049 978-0-9981331-3-3 (CC BY-NC-ND 4.0)