A New Scheme for Citation Classiﬁcation based on Convolutional Neural Networks Khadidja Bakhti 1 , Zhendong Niu 1, 2 , Ally S. Nyamawe 1 1 School of Computer Science and Technology Beijing Institute of Technology Beijing, China 2 School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, USA (bakhti.khadidja, zniu)@bit.edu.cn, nyamawe@udom.ac.tz Abstract—Automated classiﬁcation of citation function in scien- tiﬁc text is a new emerging research topic inspired by traditional citation analysis in applied linguistic and scientometric ﬁelds. The aim is to classify citations in scholarly publication in order to identify author’s purpose or motivation for quoting or citing a particular paper. Several citation schemes have been proposed to classify the citations into different functions. However, it is extremely challenging to ﬁnd standard scheme to classify cita- tions, and some of the proposed schemes have similar functions. Moreover, most of previous studies mainly used classical machine learning methods such as support vector machine and neural networks with a number of manually created features. These features are incomplete and suffer from time-consuming and error prone weakness. To address these problems, we present a new citation scheme with less functions and propose a deep learning model for classiﬁcation. The citation sentences and author’s information were fed to convolutional neural networks to build citation and author representations. A corpus was built using the proposed scheme and a number of experiments were carried out to assess the model. Experimental results have shown that the proposed approach outperforms the existing methods in term of accuracy, precision and recall. Index Terms—Citation Annotation, Citation Scheme, Deep Neural Networks, Citation Function Classiﬁcation, Convolutional Neural Networks. I. I NTRODUCTION In the previous published research works the citation is cat- egorized as a tool to calculate impact factor with an objective to know how the citation is used [1], [2]. Citation function classiﬁcation is deﬁned as the reason or motivation that why the authors cite others works in their literature, and the ﬁeld of research concerned with classifying citations into classes based on the purpose behind the citations. Classiﬁcation of citations could provide precise representation of the inﬂuence or the impact of a publication. For example, by considering only citations that are important to the citing paper and discarding citations that are perfunctory. The ﬁrst step in citation function classiﬁcation includes selecting a number of functions that citations can be categorized into, which is called a citation function classiﬁcation scheme. When a scheme has been selected, a classiﬁcation method is used to carry out the classiﬁcation of citations. Several citation function classiﬁcation schemes have been created with a different number of functions and levels of granularity [3]. For example, [4] established a citation scheme containing four dimensions with two functions in each dimension. Each dimension groups two related classes together; a citation can belong to one class only from one or more dimensions. Different names are used in the lit- erature to represent speciﬁc purposes for citations such as “category”,“class”,“type”,“reason” and “facet”. We refer to the different names throughout the paper by the word “function”. Manual citation function classiﬁcation has been proposed, but subsequently automated classiﬁcation became inevitable due to the large number of publications produced on a daily basis [3]. Automated citation function classiﬁcation has been carried in the literature into two ways; the ﬁrst way is the use of rule based methods where domain experts developed rules that were coded into computer programs to perform citation function classiﬁcation [5]. The rules were created based on a set of human labeled citations where each citation was labeled with a function or label revealing the related purpose. The second way involves applying supervised machine learning techniques [6] where a set of citations were labeled by human annotators to build the training phase. Previous studies on automated citation function classiﬁ- cation commonly used rule-based and supervised machine learning methods [5], [7]. However, the rule-based techniques do not generalize well for citations that have never been seen by the domain experts. Therefore, multiple schemes have been proposed with different granularity varies from 35 to 3 functions [8]. However, there is no standard scheme established for citation function. Therefore, there is no way that a scheme can allow authors to frame their citations and how this framing can inﬂuence the use by future citers. [7] proposed a citation scheme for classifying citation function into six functions namely based on/supply, useful, weakness, contrast, acknowledge, hedges. [9] proposed a new scheme to annotate the citations which has seven functions: background, motivation, uses, extension, continuation, comparison, and future. Regarding these proposed schemes, there is no deﬁned standard for citation classiﬁcation schemes. However, the ma- jority of functions like based on/useful and uses/extension have the same purpose; and similarities exist between functions could be difﬁcult for an annotator to differentiate them for the future use. Moreover the usability of the functions proposed is limited and cannot be adopted for all the annotators from DOI reference number: 10.18293/SEKE2018-141