Understanding Eight Years of InfoVis Conferences using PaperLens Bongshin Lee * University of Maryland Microsoft Research Mary Czerwinski † Microsoft Research George Robertson ‡ Microsoft Research Benjamin B. Bederson § University of Maryland ABSTRACT We present PaperLens, a visualization that reveals connections, trends, and activity throughout the InfoVis conference community for the last 8 years. It tightly couples views across papers, authors, and references. This paper describes how we analyzed the data, the strengths and weaknesses of PaperLens, and interesting patterns and relationships we have discovered using PaperLens. 1 INTRODUCTION Information Visualization has been studied extensively over the last 10 years to help people explore and understand data. From 1995 to 2002, 315 authors published 155 papers in the InfoVis symposium. Interfaces for visualizing search results for a digital library, such as Envision [2], exist, but we do not yet have a visualization system that allows researchers in our field to understand how researchers, topics, and outside research sources interact and influence research activity in general. In order to address this issue, we have developed a visualization called PaperLens that supports the discovery and identification of major research topics, relationships between members of the community, and trends over time (Figure 1). 2 DATA ANALYSIS 2.1 Topic Clustering We used technology developed internally at Microsoft Research to cluster the papers in the InfoVis proceedings. The clustering software was originally developed for site administrators to help build and maintain category hierarchies for documents. The text- clustering component suggests a hierarchically organized set of categories when no such structure exists. In order to cluster, we used the titles, references and keywords (if available) in the clustering process, weighting the titles more heavily to get a better clustering result. A standard list of stop words, months of the year, journal and proceeding titles and version and page numbers were removed from influencing the cluster results. Five clusters emerged from using this tool: • General (58 papers) • Dynamic Queries (28) • Graph Visualization (19) • Focus + Context Techniques (16) • Tree Visualization (31) Figure 1. PaperLens: (a) Popularity of Topic (b) Paper List (c) Selected Authors (d) Author List (e) Degrees of Separation Links (f) Year by Year Top 10 Cited Papers/Authors. 2.2 Co-authorship Analysis A co-author collaboration graph is often used to find the relationship between authors and the center of the community, i.e, the author that has the shortest average path length to all other authors in the graph [3]. However, the graph among InfoVis authors is too fragmented to give any useful insights. S. F. Roth, the center of the graph, has published 5 papers with 13 co-authors and has only 19 related colleagues among 315 possible individuals. We instead decided to display all of the related colleagues when an author is selected by the user. We compute the shortest path length between two authors on demand and call it degrees of separation. 2.3 Reference Citation Counts One of the interesting questions we wanted to answer was “Which papers/authors are most often referenced?” because this is one important metric indicating influential papers/authors. In addition to counting the number of references overall, we computed them by year and by topic to show their trends. 3 STRENGTHS 3.1 Evolution of Topics We can easily capture trends of the topics because we organized papers by their topic and signified the most popular topic each year by a small star above the relevant column in the popularity of topic view (Figure 1a). For example, the topic of Graph Visualization has grown in popularity quite recently and was most popular in 2001, while the topic of Dynamic Queries has exhibited a steady increase over the last 8 years. *email: bongshin@cs.umd.edu † email: marycz@microsoft.com ‡ email: ggr@microsoft.com § email: bederson@cs.umd.edu (a) (c) (f) (b) (d) (e)