Analyzing High-dimensional Multivariate Network Links with Integrated Anomaly Detection, Highlighting and Exploration Sungahnn Ko * Purdue University Shehzad Afzal * Purdue University Simon Walton Oxford University Yang Yang * Purdue University Junghoon Chae * Purdue University Abish Malik * Purdue University Yun Jang Sejong University Min Chen Oxford University David Ebert * Purdue University ABSTRACT This paper focuses on the integration of a family of visual ana- lytics techniques for analyzing high-dimensional, multivariate net- work data that features spatial and temporal information, network connections, and a variety of other categorical and numerical data types. Such data types are commonly encountered in transporta- tion, shipping, and logistics industries. Due to the scale and com- plexity of the data, it is essential to integrate techniques for data analysis, visualization, and exploration. We present new visual rep- resentations, Petal and Thread, to effectively present many-to-many network data including multi-attribute vectors. In addition, we de- ploy an information-theoretic model for anomaly detection across varying dimensions, displaying highlighted anomalies in a visually consistent manner, as well as supporting a managed process of ex- ploration. Lastly, we evaluate the proposed methodology through data exploration and an empirical study. Index Terms: I.3.6 [Computer Graphics]: Methodology and Techniques—Interaction techniques; I.3.8 [Computer Graphics]: Applications—Visual Analytics 1 I NTRODUCTION The recent trend of increasing size, complexity, and variety in datasets (e.g., spatial, temporal, quantitative, qualitative, network data) makes analysis and decisions from these data more challeng- ing, often called the big data problem [24, 34, 40]. One very chal- lenging type of big data is multivariate network data, especially when there are multivariate values for both nodes and links. For ex- ample, transportation, shipping, logistics, commerce, trading, elec- tricity and communication industries [8, 46] have many connected operational locations where multiple variables describe each loca- tion’s operations. With flight delay network data, various multivari- ate operational aspects are considered simultaneously: types of de- lay, patterns based on airport location, trends in time, and relation- ships among the airports. To reduce the analysts’ information over- load and to enable effective planning, analysis and decision mak- ing, an interactive visual exploration and analysis environment is needed as traditional machine learning and big data analytics alone can be insufficient [10]. While various systems and techniques for network visualization have been proposed [22], few support analyzing both multivariate network data (e.g., [43] and [28]) and map-based spatial network data (e.g., [19] and [8]). There still remains a gap in effective mul- tivariate spatial network data exploration and analysis to efficiently answer challenging questions such as the following: What are the patterns in multivariate variables on a node or among node-node * e-mail: {ko|safzal|yang260|jchae|amalik|ebertd}@purdue.edu e-mail: {simon.walton|min.chen}@oerc.ox.ac.uk e-mail:jangy@sejong.edu pairs? Are the patterns relevant to specific regions and times? Is there any seasonality in the patterns? Can we verify the patterns on a map? Which network nodes and links could be anomalous? In this work, we fill this gap by integrating a family of vi- sual analytics techniques for exploring and analyzing such com- plex data. We employ multiple linked views [33] (see Fig. 1), two new multivariate visualization techniques, petals and threads, and an information-theoretic analytical backend engine for aggregate- level and detail-level network analysis. Petals and threads efficiently present a simplified representation of many-to-many networks where multi-attribute vectors represent the size of attributes in different directions. Specifically, petals rep- resent an aggregated summary view of directional data (Fig. 3) and threads encode multiple variables of links (Fig. 2). An information- theoretic model provides our analytical engine the ability to high- light anomalies in the data. The anomaly detection can be dynami- cally configured based on new contextual requirements that usually result from user-generated hypotheses stimulated from visualiza- tion and exploration of data. The analytical method provides the visualization with additional warning signals and enables users to prioritize their exploration strategy. The contributions of our work in the multivariate spatiotemporal network visualization and analysis domain are 1) designing petals and threads for high-dimensional multivariate network link analy- sis, 2) evaluating petals and threads with a user study, 3) designing and implementing a visual analytics system using multiple coordi- nated views, 4) integrating an information-theoretic anomaly detec- tion method in the interactive visualization analysis process, and 5) exploring complex data (e.g., flight delay network) to illustrate the use and potential of our designs in the multiple-coordinated views. Our system can be applied to exploration of any multivariate spatiotemporal, network link data generated in transportation, ship- ping, logistics, commerce, trading, and communication industries (e.g., AT&T communication network data [8] and electric power grid data [46]). 2 RELATED WORK While the research topics in network visualization are as numer- ous as the visualizations themselves [22, 38], in this work, we consider network visualization techniques and tools that are per- tinent to multivariate geospatial network data. For multivariate net- work visualization research, Wattenberg [43] has designed Pivot- Graph, a software tool focusing on the relationships between node attributes and connections of multivariate graphs on a grid layout. Ploceus [28] enables multi-dimensional and multi-level network- based visual analysis on tabular data while Honeycomb [42] fo- cuses on scalability (e.g., millions of connections) using a matrix representation that is also incorporated in our matrix view. Shnei- derman et al. [38] visualize networks by semantic substrates and Selassie et al. [36] present an edge bundling technique for directed networks. For geospatial network visualization, Guo [19] has developed an integrated, interactive visualization framework that visualizes 83 IEEE Symposium on Visual Analytics Science and Technology 2014 November 9-14, Paris, France 978-1-4799-6227-3/14/$31.00 ©2014 IEEE