A New Scheme for Citation Classification based on Convolutional Neural Networks Khadidja Bakhti 1 , Zhendong Niu 1, 2 , Ally S. Nyamawe 1 1 School of Computer Science and Technology Beijing Institute of Technology Beijing, China 2 School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, USA (bakhti.khadidja, zniu)@bit.edu.cn, nyamawe@udom.ac.tz Abstract—Automated classification of citation function in scien- tific text is a new emerging research topic inspired by traditional citation analysis in applied linguistic and scientometric fields. The aim is to classify citations in scholarly publication in order to identify author’s purpose or motivation for quoting or citing a particular paper. Several citation schemes have been proposed to classify the citations into different functions. However, it is extremely challenging to find standard scheme to classify cita- tions, and some of the proposed schemes have similar functions. Moreover, most of previous studies mainly used classical machine learning methods such as support vector machine and neural networks with a number of manually created features. These features are incomplete and suffer from time-consuming and error prone weakness. To address these problems, we present a new citation scheme with less functions and propose a deep learning model for classification. The citation sentences and author’s information were fed to convolutional neural networks to build citation and author representations. A corpus was built using the proposed scheme and a number of experiments were carried out to assess the model. Experimental results have shown that the proposed approach outperforms the existing methods in term of accuracy, precision and recall. Index Terms—Citation Annotation, Citation Scheme, Deep Neural Networks, Citation Function Classification, Convolutional Neural Networks. I. I NTRODUCTION In the previous published research works the citation is cat- egorized as a tool to calculate impact factor with an objective to know how the citation is used [1], [2]. Citation function classification is defined as the reason or motivation that why the authors cite others works in their literature, and the field of research concerned with classifying citations into classes based on the purpose behind the citations. Classification of citations could provide precise representation of the influence or the impact of a publication. For example, by considering only citations that are important to the citing paper and discarding citations that are perfunctory. The first step in citation function classification includes selecting a number of functions that citations can be categorized into, which is called a citation function classification scheme. When a scheme has been selected, a classification method is used to carry out the classification of citations. Several citation function classification schemes have been created with a different number of functions and levels of granularity [3]. For example, [4] established a citation scheme containing four dimensions with two functions in each dimension. Each dimension groups two related classes together; a citation can belong to one class only from one or more dimensions. Different names are used in the lit- erature to represent specific purposes for citations such as “category”,“class”,“type”,“reason” and “facet”. We refer to the different names throughout the paper by the word “function”. Manual citation function classification has been proposed, but subsequently automated classification became inevitable due to the large number of publications produced on a daily basis [3]. Automated citation function classification has been carried in the literature into two ways; the first way is the use of rule based methods where domain experts developed rules that were coded into computer programs to perform citation function classification [5]. The rules were created based on a set of human labeled citations where each citation was labeled with a function or label revealing the related purpose. The second way involves applying supervised machine learning techniques [6] where a set of citations were labeled by human annotators to build the training phase. Previous studies on automated citation function classifi- cation commonly used rule-based and supervised machine learning methods [5], [7]. However, the rule-based techniques do not generalize well for citations that have never been seen by the domain experts. Therefore, multiple schemes have been proposed with different granularity varies from 35 to 3 functions [8]. However, there is no standard scheme established for citation function. Therefore, there is no way that a scheme can allow authors to frame their citations and how this framing can influence the use by future citers. [7] proposed a citation scheme for classifying citation function into six functions namely based on/supply, useful, weakness, contrast, acknowledge, hedges. [9] proposed a new scheme to annotate the citations which has seven functions: background, motivation, uses, extension, continuation, comparison, and future. Regarding these proposed schemes, there is no defined standard for citation classification schemes. However, the ma- jority of functions like based on/useful and uses/extension have the same purpose; and similarities exist between functions could be difficult for an annotator to differentiate them for the future use. Moreover the usability of the functions proposed is limited and cannot be adopted for all the annotators from DOI reference number: 10.18293/SEKE2018-141