Filtering Edge for Exploration of Large Graphs
Xiaodi Huang*
School of Computing and Mathematics, Charles Sturt University, Albury, NSW, 2640, Australia
ABSTRACT
Visual clutter in the layout of a large graph is mainly caused by
the overwhelming number of edges. Filtering is one of ways to
reduce the clutter. We regard a filtered graph as the compressed
one of an original graph. Based on this view, a filtering approach
is presented to reduce the visual clutter of a layout in a way that
hidden patterns can be revealed gradually. The experiments have
demonstrated the performance of the proposed approach in our
prototype system. As evidenced by real examples, the system
allows users to explore a graph at adjustable, continuous levels of
details in an interactive way. This new approach is able to reveal
more hidden patterns in graphs than existing approaches,
providing a new way to gain insights into graph data.
Keywords: large graph visualization, filtering,
Index Terms: H.5.2 [Information Interfaces and Presentation]:
User Interfaces – Evaluation/Methodology
1 INTRODUCTION
Apart from other approaches, filtering is regarded as an effective
way of reducing the visual clutter of a large graph. The two
fundamental questions on filtering are: filter what? Nodes or
edges, and at which level? A discrete or continuous level.
Filter what? We filter edges instead of nodes. Node filtering has
several drawbacks. For example, it is desirable to remove those
insignificant edges of a hub node for reducing visual clutter. Node
positions sometimes have a semantic meaning such as nodes
depicting locations in a traffic network. They cannot be removed.
At which level to filter a graph? In other words, how many
levels of detail a user can specify for exploring a graph? Current
graph systems [1, 2] normally allows users to explore a graph only
at a discrete level, which refers to the limited number of the total
level of an abstract hierarchy (or more hierarchies [2]) of a graph.
The level-of-details in exploring a graph is constrained by the
limited level of abstract of the graph. In order to remedy this
problem, we introduce a notion of continuous level-of-details that
refers to the numerous numbers of levels, implying that a user-
adjustable threshold as the cutoff of scores for filtering a graph is
a continuous, real value. Filtering a graph at the continuous level
can makes almost every edge visible or invisible, providing for
smooth, continuous changes between different levels of detail. As
such users adjust the filtering rate interactively to obtain the
insight from desirable visual results.
In order to achieve a continuous level of detail for filtering
edges, we need to distinguish different edges in a graph. It is
desirable that each edge is associated with a unique score.
Existing metrics for node centrality such as node degree,
eigenvector centrality, and PageRank, which are all about nodes
rather than edges (the number of edges is normally more than that
of nodes in a graph), cannot meet this requirement. For example, a
number of nodes in a graph are of the same degrees. All nodes
with the same degree cannot be distinguished from each other by
their degrees. In this work, we cast the reduction of visual clutter
of graph layout as the problem of compressing a graph. Based on
this view, we compute the edge scores, and then simplify a dense
graph. Our prototype system uses a novel exploration model that
allows users to filter a graph at an arbitrary number of levels of
detail.
2 THE APPROACH
It is assumed that we have an undirected, unweight graph G = (V,
E) with n nodes and m edges, where V is a set of nodes, and E a
set of edges. The graph is represented by the incidence matrix L of
graph G is a × matrix with its member
!"
. The problem of
Edge Ranking (ER) can be formalized as: : → ℝ
!
. The ER
scores of all the edges are denoted by a column vector e (×1),
whose i-th element is the ER score of i edge
!
(0 ≤
!
≤1) for
ranking.
The importance of an edge is measured by the importance of its
connected edges in an iterative way. As we know, an edge can be
represented by its incident nodes. It is incident to only two nodes
at most. Based on this fact, ER scores can be derived from the
incident matrix of a graph. Essentially, filtering a graph can be
regarded as compressing it. In other words, we use a compressed
matrix to approximate the incident matrix of a graph. Specifically,
the purpose of the objective function is to minimize the difference
between L and its approximate matrix.
We use non-negative matrix factorization (NMF) [5] to obtain a
compressed version of the original data matrix. Given a matrix L,
the optimal choice is the nonnegative matrices W and H that
minimize the function of the reconstruction error between L and
WH:
, = | − |
!
!
= (
!"
− ()
!"
)
!
!,!
(1)
where
!"
≈ ()
!"
=
!"
!"
!
!!!
subject to the constraints of
!"
≥ 0, and ℎ
!"
≥ 0, where 0 ≤ ≤
, 0 ≤ α ≤ , and 0 ≤ ≤ . The dimensions of the factorized
matrices W and H are × and ×, respectively.
For filtering a graph, we set = 1. In other words, we use a
meta-node, which can be regarded as an abstract cluster node, to
rank different edges in terms of their link relationships to this
super node. All the edges are in fact projected into 1-dimensional
space in which the axis corresponds to this particular meta-node
cluster. The W basis vectors can be thought of as the ‘building
blocks’ of the data. Each element
!
of matrix W is the degree to
which edge belongs to this meta-node cluster. Equivalently,
!
can be thought of as a node archetype comprising a set of
edges where the cell value of each edge defines the rank of the
edge in the feature: The higher the cell value of an edge, the
higher the rank of the edge in the feature is. Describing how
strongly each ‘building block’ is present, a column in the
coefficients matrix H represents an original node with a cell value
defining the rank of the node for a feature. All edges are projected
into one virtual abstract node, which maximizes the variances
between different edges. These project coordinates are regarded as
the ER scores of the corresponding edges. Therefore, we can
regard
!
as the ER score of the i-th edge; that is, = .
After computing the scores by the above approach, the edges in
a graph are ranked according to their ERs. All the edges with their
ER scores that are less than a cutoff value as the filter rate are then
* xhuang@csu.edu.au
115
IEEE Symposium on Large Data Analysis and Visualization 2013
October 13 - 14, Atlanta, Georgia, USA
978-1-4799-1658-0/13/$31.00 ©2013 IEEE