Visualizing Multivariate Network Using GeoSOM and Spherical Disk Layout Yingxin Wu * School of Information Technologies The University of Sydney Masahiro Takatsuka † ViSLAB The University of Sydney Richard Webber ‡ National ICT Australia Limited Abstract We have previously introduced a two-phase approach to visualize a multivariate network[5]. Positions of the graph nodes were de- termined by their attributes and binary connectivity (connected or not-connected). This paper presents the improved method: 1) graph distances are combined with the node attributes to improve the final positions of the graph nodes; 2) we use the uniform grid technique to speed up the disk layout which reduces the number of edge cross- ings and node/edge overlaps. We also provide an interface to gen- erate component planes to depict how individual attribute changes across the network. Keywords: Multivariate Network, GeoSOM, Crossing Reduction 1 Introduction A multivariate network consists of connected data points, each of which has multidimensional attributes. Connections in the network describe relationships/activities between the data points. Many real world data sets can be represented using this type of network. One example is the world trade network in which countries are repre- sented as nodes, each country has properties like gross domestic product (GDP), GDP growth, population and population growth, and edges represent imports/exports between the countries. Differ- ent trading activities, such as exchanging metal products or cereals, form different networks. We believe that visualizing a multivariate network can help users extract useful information from both aspects of the data. In previous work [5], a two-phrase approach was introduced to visualize multivariate networks. The idea was to put together the graph nodes which are connected and also close in attribute space. Similarities of the graph nodes can then be observed based on the graph nodes’ relative positions instead of comparing values of dif- ferent attributes. In this article, we improve the original approach by introducing graph distances into Geodesic Self-Organizing Map (GeoSOM)’s training process. GeoSOM [4] determines the initial layout of the network based on data attributes and graph distances. We also use the uniform grid technique to reduce the computational complexity of the spherical disk layout. Furthermore, we add an in- terface to show the component planes of GeoSOM so that the user can examine the distribution of different attributes across the net- work. 2 Phrase One: Modify the Batch Training Process Previously, we treat each graph node as a point in high-dimensional space and use GeoSOM to project the nodes onto the surface of a sphere [5]. Although this non-linear mapping tries to put together graph nodes who are neighbors in the graph, it has little effect on graph nodes which are two or more steps apart. We improve the projection by incorporating the graph distances. * e-mail: chwu@it.usyd.edu.au † e-mail:masa@vislab.net ‡ e-mail:Richard.Webber@nicta.com.au The batch mode training of self-organizing map [2] is an iterative process: At each iteration, every input x finds the best matching neuron (BMN) whose weight vector is nearest to it. Subsequently, each weight vector w j of neuron n j is set to be the mean over all the inputs registered with it and its neighboring neurons: w j = ∑ N i=1 h b i j (t ) x i ∑ N i=1 h b i j (t ) (1) where b i is the BMN of input x i and N is the total number of inputs mapped to the neuron n j and its neighbors; size of the neighborhood is controlled by h b i j . According to equation 1, the contribution of each input to weight vector w j is weighted by the neighborhood function h b i j . We add a coefficient f ij to equation 1 to make the nodes who are close in graph space near each other. Here, a graph node v i ’s attributes are denoted as a vector x i : w j = ∑ N i=1 (1 + f ij )h b i j (t ) x i ∑ N i=1 (1 + f ij )h b i j (t ) (2) f ij is calculated from the average graph distance d(i, j) of node v i to all the nodes v j mapped to neuron n j : f ij = 1 - ∑ v j ∈n j d(i, j) KD (3) where K is the number of shortest paths connecting nodes v i to those nodes that mapped to neuron n j . D is the graph diameter. According to equation 2, the closer v i and v j are in terms of graph distance, the larger f ij becomes. Therefore, f ij amplifies the contri- bution of nodes v i which are closely connected to the nodes mapped to neuron n j . In other words, adding the term f ij has the effect of pulling together the nodes which are close in graph distances. 3 Phrase Two: The Spherical Disk Layout After using GeoSOM to produce an initial layout of a multivariate network, we adjust the positions of the graph nodes to make the final layout visually more pleasing. The algorithm should separate the graph nodes which are mapped to the same point on the sphere and reduce the number of edge crossings and node/edge overlaps. A greedy iterative process was previously used in [5] to find an appropriate location for each graph node on a circle surrounding its BMN. The algorithm tried to place each graph node at one of the equally spaced positions on the circle, such that the number of crossings and node/edge overlaps is minimized. This process is repeated until the number of crossings and overlaps no longer changes. Calculating the number of crossings and overlaps takes time O( VE + E 2 ) in each iteration. Here we employed the uniform grid technique [1] to speed up the algorithm. GeoSOM organizes its neurons on the grid of an icosahedron- based geodesic dome. We make use of the grid to divide the spher- ical surface into triangles (see Figure 1). These triangles are al- most uniform in shape and size, thus can be used as a spheri- cal “uniform” grid. Using this grid, in calculating the number of crossings, each edge only needs to compare itself with those edges who go through the same triangles. Similarly, for node/edge over- laps, each node only needs to be tested against the edges who go