Comparative Study between First and All-Author Co-Citation Analysis Based on Citation Indexes Generated from XML Data Jesper Wiborg Schneider, Birger Larsen and Peter Ingwersen jws@db.dk, blar@db.dk, pi@db.dk Department of Information Studies, Royal School of Library and information Science Birketinget 6, DK-2300 Copenhagen S (Denmark) Abstract The study presents a comparative analysis between first and all-author co-citation analyses, as well as comparison between two matrix generation approaches. We thus continue the latest research in author co-citation analysis (ACA), where the results of the traditional first-author analyses based on ISI citation indexes are challenged by incorporating all-authors from the cited references. Identifying all cited authors from references in source papers is an extremely cumbersome process if the Thomson ISI citation indexes are used as a basis. Due to the difficulty in obtaining all-author co-citation data few such studies exist. In order to study all-authors co-citation we use a citation index generated from documents in XML code. This allows us to carry out a comparative study between first and all-author co-citation analyses based on the hitherto largest set of references and the broadest domain of research. Introduction Author co-citation analysis (ACA), introduced by White and Griffith (1981), is a technique for mapping the ‘intellectual structure’ of a research field, where the latter is defined as a coherent literature set. The intellectual structure is mapped from the oeuvres of the most cited and co-cited first authors in a particular literature set. Since its introduction, ACA has become a popular and much used technique. However, recently a debate concerning methodical procedures in ACA has emerged. Especially, the approach to ACA developed at Drexel University (e.g., White & Griffith, 1981; McCain, 1990) has been the focus of the current debate. Essentially, four methodical issues have been debated: 1) scalability (e.g., Chen, 1999), 2) units of analysis and their definition (e.g., Persson, 2001; Zhao, 2006; Rousseau & Zuccala, 2004), 3) the choice of proximity measures (e.g., Ahlgren, Jarneving, & Rousseau, 2003; Schneider & Borlund, 2007a; 2007b), and most recently 4) generation and transformation of matrices (Leydesdorff & Vaughan, 2006; Schneider & Borlund, 2007a). The present paper addresses the second and fourth issues in a comparative study of first and all-author co- citation analysis based on different matrix generation approaches in structured XML documents that allow for the construction of ad-hoc citation indexes. The paper is structured as follows. The following section discusses briefly previous research on all-author co-citation analyses and matrix generation. The proceeding section describes the research method of the study, i.e., data collection and data analysis. The next section presents and discusses the results, and the contribution ends with a conclusion. Previous Work on All-author Co-citations and matrix generation In several respects, the methodical approach to ACA developed at Drexel University has been shaped by specific technical features that have seemingly brought some constraints to the ACA methodology. Most important is the dependence upon the standardized cited reference strings in Thompson ISI’s citation indexes, and the use of the SPSS statistical package as the tool for multivariate analyses. The most obvious example is that the cited reference strings only allows for first authors as units of analysis in ACA. As a result, ACA methodology only takes into account first authors in the definition of author co-citation counts. Two authors are considered to be co-cited when at least one document from each author’s oeuvre occurs in the same reference list of a citing document, where an author’s oeuvre is defined as all the works with the author as the first author (McCain, 1990). This definition has rarely been challenged. Persson (2001) is the first empirical study that compares the potential difference in intellectual structure between mappings done by first-author and all-author co-citation analyses. The study is based on 7001 source documents from library and information science journals in the CD-ROM version of Social Science Citation Index 1986-1996. The study investigates how these source documents have been co-cited with each other within the dataset by use of Published as: Schneider, J. W., Larsen, B. and Ingwersen, P. (2007): Comparative study between first and all-author co-citation analysis based on citation indexes generated from XML data. In: Torres-Salinas, D. and Moed, H. F. eds. Proceedings of ISSI 2007, 11th International Conference of the International Society for Scientometrics and Informetrics, CSIC, Madrid, Spain; June 25-27, 2007. Madrid: CSIC, p. 696-707.