UNCORRECTED PROOF Co-Citation Count vs Correlation For Influence Network Visualization Steven Noel 1 Chee-Hung Henry Chu 2 Vijay Raghavan 2 1 Center for Secure Information Systems, George Mason University, Fairfax, VA, U.S.A.; 2 Center for Advanced Computer Studies, The University of Louisiana at Lafayette, Lafayette, LA, U.S.A. Correspondence: Henry Chu, Center for Advanced Computer Studies, The University of Louisiana at Lafayette, PO Box 44330, Lafayette, LA 70504-4330, U.S.A. Tel: +1 337 482 6309; Fax: +1 337 482 5791 E-mail: cice@cacs.louisiana.edu Received: 14 March 2003 Revised: 5 September 2003 Accepted: 6 September 2003 Abstract Visualization of author or document influence networks as a two-dimensional image can provide key insights into the direct influence of authors or documents on each other in a document collection. The influence network is constructed based on the minimum spanning tree, in which the nodes are documents and an edge is the most direct influence between two documents. Influence network visualizations have typically relied on co-citation correlation as a measure of document similarity. That is, the similarity between two documents is computed by correlating the sets of citations to each of the two documents. In a different line of research, co-citation count (the number of times two documents are jointly cited) has been applied as a document similarity measure. In this work, we demonstrate the impact of each of these similarity measures on the document influence network. We provide examples, and analyze the significance of the choice of similarity measure. We show that correlation-based visualizations exhibit chaining effects (low average vertex degree), a manifestation of multiple minor variations in document similarities. These minor similarity variations are absent in count-based visualizations. The result is that count-based influence network visualizations are more consistent with the intuitive expectation of authoritative documents being hubs that directly influence large numbers of documents. Information Visualization (2003) 00, 000 – 000. doi:10.1057/palgrave.ivs.9500049 Keywords: Document collection visualization; co-citation analysis; influence networks; minimum spanning tree; graph layout Introduction Visualization of document-similarity structure contributes much to the understanding of relationships among documents in a collection. Such visual representations are ideal for rapid assimilation of large-scale structure, and are complementary to lower-level textual descriptions. A number of key applications can benefit from document-similarity visualizations, such as various forms of information retrieval, or biblio- graphic analyses of scientific disciplines. A major development in information science was Garfield’s introduction of indexes of literature citations. 1 As citations have relatively clear semantics regarding literature influences, citation-based analysis avoids many of the difficulties inherent in language-based analysis. Early forms of citation-based analysis used bibliographic coupling as a measure of similarity between pairs of documents. The dual form (co-citation) introduced by Small 2 has become much more popular. That is, the similarity between two documents is the number of documents that cite them in common (co-citation), rather than the number of documents that they themselves cite in common (bibliographic coupling). Journal: IVS Disk used Despatch Date: 7/10/2003 Article : ppl_ivs_9500049 Pages: 1–11 Op: thilakam Ed: viji Trim: 280mm 210mm Information Visualization (2003) 0, 000–000 & 2003 Palgrave Macmillan Ltd. All rights reserved 1473-8716 $25.00 www.palgrave-journals.com/ivs