Visualization of Content Information in Networks using GlyphNet Anne Denton and Paul Juell Department of Computer Science, North Dakota State University, Fargo, ND 58105 {anne.denton, paul.juell}@ndsu.nodak.edu Abstract. Visualization of information on a graph has two aspects that are equally important in many settings: Visualization of graph connectivity and visualization of node information. We introduce GlyphNet, a tool that displays node-related information graphically, using small icons or glyphs. Our goal is to assist researchers who are applying data mining techniques to relational data, and have a need to identify patterns that involve node attribute values of interconnected nodes. GlyphNet represents node data as glyphs, analogously to the symbols on a weather map. Rather than placing glyphs in a spatial context, as is the case for weather symbols, GlyphNet displays node information in its graph context. We demonstrate the use of GlyphNet for the example of a data mining task that involves yeast gene and protein properties within the corresponding protein-protein interaction network. 1. Introduction Information visualization on a graph is important for many subject areas. Social networks, the link structure of the World Wide Web, and biological networks, such as protein-protein interaction graphs and biochemical pathways, are examples of data that are commonly represented as graphs. Many techniques exist to display such graphs [1]. Most of them do, however, focus on connectivity. Node data, if displayed at all, is usually included in textual form, such as in the class diagrams and entity-relationship diagrams that are common in software engineering. Modern information visualization tools add several aspects to traditional graph drawing. Graph navigation allows a user to view detailed information around a node of interest while, at the same time, allowing access to the rest of the graph. It is common that computer-based tools also provide probe functionality to display node details in a separate window, based on selections made by the user. A good example of probe functionality is implemented in the Web navigation tool TouchGraph WikiBrowser [2] that allows navigation within the link structure of Web pages while making the content of individual pages available separately. The user has the option of selecting a page through mouse click that will then be displayed in a separate browser window. Although this is an efficient solution to the problem of retrieving information from a network of nodes, it is not suitable to typical data mining tasks such as the identification of interesting patterns. Visual data mining can be seen as a hypothesis generation process [3]. We will, therefore, now look at possible hypotheses that can be generated from different graph visualization tools. Traditional graph drawing techniques that display nothing but connectivity allow investigating hypotheses on the distribution of edges. Many interesting results can come from such studies, e.g., the identification of different types of networks [4], including scale-free networks and random networks. Graph visualization tools with probe functionality allow generating hypotheses that go beyond connectivity alone. A particular node can be viewed within the nodes in its network neighborhood, which can, for example, allow generating hypothesis regarding relationships between edge distribution and node importance. Google’s PageRank [5] algorithm that relates the importance of a Web page to the distribution of incoming links would be accessible to such analysis. Much work is still being done to improve on importance measures [6], and a graph visualization tool with probe functionality could be used productively in this context. Much current work in the area of data mining involving relational data does, however, go beyond the issues addressed so far [7]. The term relational data refers to data in which a relationship exists between data records that can be represented by a graph. Typical questions of interest are how relational neighbors can or should be used in classification and clustering [8]. Such questions require access to node data of not only one node but of all nodes for which a hypothesis is to be made. In very small graphs one may try to resolve the problem by including a textual representation of the node data into the graph, as is done in class diagrams and entity relationship diagrams. When this strategy is used many problems of textual representations recur that were supposed to be addressed by the visualization: Textual information has to be interpreted and patterns are, therefore, hard to see. Text also uses up a significant amount of space, which limits the number of nodes that can be displayed simultaneously.