Analyzing High-dimensional Multivariate Network Links with Integrated
Anomaly Detection, Highlighting and Exploration
Sungahnn Ko
*
Purdue University
Shehzad Afzal
*
Purdue University
Simon Walton
†
Oxford University
Yang Yang
*
Purdue University
Junghoon Chae
*
Purdue University
Abish Malik
*
Purdue University
Yun Jang
‡
Sejong University
Min Chen
†
Oxford University
David Ebert
*
Purdue University
ABSTRACT
This paper focuses on the integration of a family of visual ana-
lytics techniques for analyzing high-dimensional, multivariate net-
work data that features spatial and temporal information, network
connections, and a variety of other categorical and numerical data
types. Such data types are commonly encountered in transporta-
tion, shipping, and logistics industries. Due to the scale and com-
plexity of the data, it is essential to integrate techniques for data
analysis, visualization, and exploration. We present new visual rep-
resentations, Petal and Thread, to effectively present many-to-many
network data including multi-attribute vectors. In addition, we de-
ploy an information-theoretic model for anomaly detection across
varying dimensions, displaying highlighted anomalies in a visually
consistent manner, as well as supporting a managed process of ex-
ploration. Lastly, we evaluate the proposed methodology through
data exploration and an empirical study.
Index Terms: I.3.6 [Computer Graphics]: Methodology and
Techniques—Interaction techniques; I.3.8 [Computer Graphics]:
Applications—Visual Analytics
1 I NTRODUCTION
The recent trend of increasing size, complexity, and variety in
datasets (e.g., spatial, temporal, quantitative, qualitative, network
data) makes analysis and decisions from these data more challeng-
ing, often called the big data problem [24, 34, 40]. One very chal-
lenging type of big data is multivariate network data, especially
when there are multivariate values for both nodes and links. For ex-
ample, transportation, shipping, logistics, commerce, trading, elec-
tricity and communication industries [8, 46] have many connected
operational locations where multiple variables describe each loca-
tion’s operations. With flight delay network data, various multivari-
ate operational aspects are considered simultaneously: types of de-
lay, patterns based on airport location, trends in time, and relation-
ships among the airports. To reduce the analysts’ information over-
load and to enable effective planning, analysis and decision mak-
ing, an interactive visual exploration and analysis environment is
needed as traditional machine learning and big data analytics alone
can be insufficient [10].
While various systems and techniques for network visualization
have been proposed [22], few support analyzing both multivariate
network data (e.g., [43] and [28]) and map-based spatial network
data (e.g., [19] and [8]). There still remains a gap in effective mul-
tivariate spatial network data exploration and analysis to efficiently
answer challenging questions such as the following: What are the
patterns in multivariate variables on a node or among node-node
*
e-mail: {ko|safzal|yang260|jchae|amalik|ebertd}@purdue.edu
†
e-mail: {simon.walton|min.chen}@oerc.ox.ac.uk
‡
e-mail:jangy@sejong.edu
pairs? Are the patterns relevant to specific regions and times? Is
there any seasonality in the patterns? Can we verify the patterns on
a map? Which network nodes and links could be anomalous?
In this work, we fill this gap by integrating a family of vi-
sual analytics techniques for exploring and analyzing such com-
plex data. We employ multiple linked views [33] (see Fig. 1), two
new multivariate visualization techniques, petals and threads, and
an information-theoretic analytical backend engine for aggregate-
level and detail-level network analysis.
Petals and threads efficiently present a simplified representation
of many-to-many networks where multi-attribute vectors represent
the size of attributes in different directions. Specifically, petals rep-
resent an aggregated summary view of directional data (Fig. 3) and
threads encode multiple variables of links (Fig. 2). An information-
theoretic model provides our analytical engine the ability to high-
light anomalies in the data. The anomaly detection can be dynami-
cally configured based on new contextual requirements that usually
result from user-generated hypotheses stimulated from visualiza-
tion and exploration of data. The analytical method provides the
visualization with additional warning signals and enables users to
prioritize their exploration strategy.
The contributions of our work in the multivariate spatiotemporal
network visualization and analysis domain are 1) designing petals
and threads for high-dimensional multivariate network link analy-
sis, 2) evaluating petals and threads with a user study, 3) designing
and implementing a visual analytics system using multiple coordi-
nated views, 4) integrating an information-theoretic anomaly detec-
tion method in the interactive visualization analysis process, and 5)
exploring complex data (e.g., flight delay network) to illustrate the
use and potential of our designs in the multiple-coordinated views.
Our system can be applied to exploration of any multivariate
spatiotemporal, network link data generated in transportation, ship-
ping, logistics, commerce, trading, and communication industries
(e.g., AT&T communication network data [8] and electric power
grid data [46]).
2 RELATED WORK
While the research topics in network visualization are as numer-
ous as the visualizations themselves [22, 38], in this work, we
consider network visualization techniques and tools that are per-
tinent to multivariate geospatial network data. For multivariate net-
work visualization research, Wattenberg [43] has designed Pivot-
Graph, a software tool focusing on the relationships between node
attributes and connections of multivariate graphs on a grid layout.
Ploceus [28] enables multi-dimensional and multi-level network-
based visual analysis on tabular data while Honeycomb [42] fo-
cuses on scalability (e.g., millions of connections) using a matrix
representation that is also incorporated in our matrix view. Shnei-
derman et al. [38] visualize networks by semantic substrates and
Selassie et al. [36] present an edge bundling technique for directed
networks.
For geospatial network visualization, Guo [19] has developed
an integrated, interactive visualization framework that visualizes
83
IEEE Symposium on Visual Analytics Science and Technology 2014
November 9-14, Paris, France
978-1-4799-6227-3/14/$31.00 ©2014 IEEE