Extending co-citation analysis to discover authors with multiple expertise Yu-Min Su a, * , Shu-Ching Yang b , Ping-Yu Hsu a , Wen-Lung Shiau c a Department of Business Administration, National Central University, No. 300, Jhongda Road, Jhongli City, Taoyuan 32001, Taiwan, ROC b Institute of Industrial Management, National Central University, Taiwan, ROC c Department of Information Management, Ming Chuan University, Taiwan, ROC article info Keywords: Author co-citation analysis (ACA) Clustering Complete author set Computing classiﬁcation system (CCS) abstract The author co-citation analysis (ACA) method is commonly used to group authors of reference papers. Since the traditional ACA method analyzes only ﬁrst authors of reference papers, it disregards the contri- butions of other coauthors and can only group each ﬁrst author into one cluster. This study proposes an innovative ACA algorithm called ‘‘complete author pair (CAP) algorithm”, which groups complete author sets of reference papers into clusters and thus ﬁnds authors who may have expertise in more than one area. The CAP algorithm is implemented in two citation data banks that collected paper references from two ACM journals during 2002–2005. The results show that the CAP algorithm runs up to 90% of average precision in each citation bank when comparing against ACM CCS. Ó 2008 Elsevier Ltd. All rights reserved. 1. Introduction To keep self up-to-date, one of the most important tasks for every researcher is to read articles published by other researchers in related ﬁelds. Therefore, tracking researchers and articles in re- lated ﬁelds are vital activities for the academic community. To boost the productivity of the academic community, tools searching articles with features such as authors, keywords, journal titles are commonly employed. Among the features, searching by authors in related ﬁelds can effectively ﬁnd articles published by known authors in the interested ﬁelds. Author co-citation analysis (ACA) method is commonly used to identify authors in related ﬁelds (Egghe & Rousseau, 1990). The ACA method ﬁgures author cluster map by grouping ﬁrst authors of citation papers in a citation bank (McCain, 1990; White & McCain, 1998). For any two ﬁrst authors who are co-referenced by one source paper, they mutually combine to form as ﬁrst author pairs. The accumulated frequency counts of the ﬁrst author pairs in a citation bank are recorded in the co-citation frequency matrix. Then, the co-citation frequency matrix is converted into the corre- lation matrix by employing the Pearson’s correlation coefﬁcient formula (McCain, 1990). A clustering method is performed with correlation matrix to group the ﬁrst authors in the citation bank. The ACA method therefore can help to recommend related authors when an author is given. However, traditional ACA methodology has two shortcomings. Firstly, the traditional ACA method analyzes only ﬁrst authors and disregards the contributions of other coau- thors. The approach may lose authors who tend to place them- selves in the second or later places in the author lists, which many professors and laboratory managers may adopt. Secondly, each ﬁrst author is grouped to one cluster only. Therefore, a researcher with multiple expertise or interests cannot be reason- ably presented by the clustering result of the traditional ACA method. To remedy the aforementioned problems, a novel approach is proposed. The approach takes complete author set of each citation paper into consideration when computing the relations among authors. Any two complete author sets whose articles are co-ref- erenced by the same papers are recorded as a complete author pair. Then, the accumulated frequency count of each complete author pair in a citation bank will be calculated. A clustering on the complete author sets is then performed with correlation ma- trix computed from the frequency counts of the complete author pairs. Since authors may be involved in more than one complete author sets, authors may end up appearing in different clusters, each of which can be viewed as an expert domain. The effective- ness of the approach is evaluated with the measure of precision and recall, based on the ACM CCS classiﬁcation tree with source papers published in two ACM journals from 2002 to 2005. The re- sults show that the algorithm can discover authors who published papers across other specialized domains with 90% of average pre- cision and 75% of average recall when comparing against ACM CCS. The rest of the paper is organized as follows: the related work is summarized in Section 2, the novel algorithm is presented in Section 3, the experiment results are shown and discussed in Section 4, and the conclusions and future works are stated in Section 5. 0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.03.022 * Corresponding author. Tel.: +886 3 4227151x66100; fax: +886 3 4226062. E-mail addresses: 93441023@cc.ncu.edu.tw, swimminghaha@msn.com (Y.-M. Su). Expert Systems with Applications 36 (2009) 4287–4295 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa