Filtering Edge for Exploration of Large Graphs Xiaodi Huang* School of Computing and Mathematics, Charles Sturt University, Albury, NSW, 2640, Australia ABSTRACT Visual clutter in the layout of a large graph is mainly caused by the overwhelming number of edges. Filtering is one of ways to reduce the clutter. We regard a filtered graph as the compressed one of an original graph. Based on this view, a filtering approach is presented to reduce the visual clutter of a layout in a way that hidden patterns can be revealed gradually. The experiments have demonstrated the performance of the proposed approach in our prototype system. As evidenced by real examples, the system allows users to explore a graph at adjustable, continuous levels of details in an interactive way. This new approach is able to reveal more hidden patterns in graphs than existing approaches, providing a new way to gain insights into graph data. Keywords: large graph visualization, filtering, Index Terms: H.5.2 [Information Interfaces and Presentation]: User Interfaces – Evaluation/Methodology 1 INTRODUCTION Apart from other approaches, filtering is regarded as an effective way of reducing the visual clutter of a large graph. The two fundamental questions on filtering are: filter what? Nodes or edges, and at which level? A discrete or continuous level. Filter what? We filter edges instead of nodes. Node filtering has several drawbacks. For example, it is desirable to remove those insignificant edges of a hub node for reducing visual clutter. Node positions sometimes have a semantic meaning such as nodes depicting locations in a traffic network. They cannot be removed. At which level to filter a graph? In other words, how many levels of detail a user can specify for exploring a graph? Current graph systems [1, 2] normally allows users to explore a graph only at a discrete level, which refers to the limited number of the total level of an abstract hierarchy (or more hierarchies [2]) of a graph. The level-of-details in exploring a graph is constrained by the limited level of abstract of the graph. In order to remedy this problem, we introduce a notion of continuous level-of-details that refers to the numerous numbers of levels, implying that a user- adjustable threshold as the cutoff of scores for filtering a graph is a continuous, real value. Filtering a graph at the continuous level can makes almost every edge visible or invisible, providing for smooth, continuous changes between different levels of detail. As such users adjust the filtering rate interactively to obtain the insight from desirable visual results. In order to achieve a continuous level of detail for filtering edges, we need to distinguish different edges in a graph. It is desirable that each edge is associated with a unique score. Existing metrics for node centrality such as node degree, eigenvector centrality, and PageRank, which are all about nodes rather than edges (the number of edges is normally more than that of nodes in a graph), cannot meet this requirement. For example, a number of nodes in a graph are of the same degrees. All nodes with the same degree cannot be distinguished from each other by their degrees. In this work, we cast the reduction of visual clutter of graph layout as the problem of compressing a graph. Based on this view, we compute the edge scores, and then simplify a dense graph. Our prototype system uses a novel exploration model that allows users to filter a graph at an arbitrary number of levels of detail. 2 THE APPROACH It is assumed that we have an undirected, unweight graph G = (V, E) with n nodes and m edges, where V is a set of nodes, and E a set of edges. The graph is represented by the incidence matrix L of graph G is a × matrix with its member  !" . The problem of Edge Ranking (ER) can be formalized as: :  → ℝ ! . The ER scores of all the edges are denoted by a column vector e (×1), whose i-th element is the ER score of i edge  ! (0 ≤  ! ≤1) for ranking. The importance of an edge is measured by the importance of its connected edges in an iterative way. As we know, an edge can be represented by its incident nodes. It is incident to only two nodes at most. Based on this fact, ER scores can be derived from the incident matrix of a graph. Essentially, filtering a graph can be regarded as compressing it. In other words, we use a compressed matrix to approximate the incident matrix of a graph. Specifically, the purpose of the objective function is to minimize the difference between L and its approximate matrix. We use non-negative matrix factorization (NMF) [5] to obtain a compressed version of the original data matrix. Given a matrix L, the optimal choice is the nonnegative matrices W and H that minimize the function of the reconstruction error between L and WH:  ,  = |  −  | ! ! = ( !" − () !" ) ! !,! (1) where  !" ≈ () !" =  !"  !" ! !!! subject to the constraints of  !" ≥ 0, and ℎ !" ≥ 0, where 0 ≤  ≤ , 0 ≤ α ≤ , and 0 ≤  ≤ . The dimensions of the factorized matrices W and H are × and ×, respectively. For filtering a graph, we set  = 1. In other words, we use a meta-node, which can be regarded as an abstract cluster node, to rank different edges in terms of their link relationships to this super node. All the edges are in fact projected into 1-dimensional space in which the axis corresponds to this particular meta-node cluster. The W basis vectors can be thought of as the ‘building blocks’ of the data. Each element  ! of matrix W is the degree to which edge belongs to this meta-node cluster. Equivalently,  ! can be thought of as a node archetype comprising a set of edges where the cell value of each edge defines the rank of the edge in the feature: The higher the cell value of an edge, the higher the rank of the edge in the feature is. Describing how strongly each ‘building block’ is present, a column in the coefficients matrix H represents an original node with a cell value defining the rank of the node for a feature. All edges are projected into one virtual abstract node, which maximizes the variances between different edges. These project coordinates are regarded as the ER scores of the corresponding edges. Therefore, we can regard  ! as the ER score of the i-th edge; that is,  = . After computing the scores by the above approach, the edges in a graph are ranked according to their ERs. All the edges with their ER scores that are less than a cutoff value as the filter rate are then * xhuang@csu.edu.au 115 IEEE Symposium on Large Data Analysis and Visualization 2013 October 13 - 14, Atlanta, Georgia, USA 978-1-4799-1658-0/13/$31.00 ©2013 IEEE