Behavioral Clusters in Dynamic Graphs
James Fairbanks
*
, Ramakrishnan Kannan, Haesun Park, David A. Bader
School of Computational Science and Engineering
Georgia Institute of Technology
Abstract
This paper contributes a method for combining sparse parallel graph algorithms with dense parallel linear
algebra algorithms in order to understand dynamic graphs including the temporal behavior of vertices. Our
method is the first to cluster vertices in a dynamic graph based on arbitrary temporal behaviors. In order
to successfully implement this method, we develop a feature based pipeline for dynamic graphs and apply
Nonnegative Matrix Factorization (NMF) to these features. We demonstrate these steps with a sample of the
Twitter mentions graph as well as a CAIDA network traffic graph.We contribute and analyze a parallel NMF
algorithm presenting both theoretical and empirical studies of performance. This work can be leveraged by
graph/network analysts to understand the temporal behavior cluster structure and segmentation structure
of dynamic graphs.
Keywords: dynamic graph analysis, streaming, matrix factorization, nonnegative matrix factorization
(NMF), behavioral clusters, low rank approximation
1. Introduction
There are many domains of data analysis that can be modeled with the graph abstraction. In particular
we are interested in social networks and internet connection networks. These networks are collections of in-
teractions occurring in complex patterns. Analyzing these patterns is essential to leveraging the information
contained in these networks. Because the most important networks are the networks that are in heavy use
right now, methods to understand temporal patterns in dynamic networks are important.
The availability of big data has driven an adoption of large scale statistical techniques, both classical and
modern. These techniques are not immediately applicable to graph data and this leaves analysts separated
from their familiar software tools. In order to connect graph analysis and statistical reasoning, we introduce
vertex features which can be calculated efficiently and then analyzed using familiar large scale statistical
software tools. This connection is bidirectional because statistical analysis of vertex features informs the
computation of additional features. The observed difficulty of writing scalable parallel graph algorithms
for scale-free and irregular graphs advises against writing inferential and mathematical code to analyze the
graphs directly. In this paper we address this gap by first applying non-inferential graph code to generate
vectorial data that is statistically well behaved, then applying a state of the art vectorial technique to this
data, which provides insight into the original graph. A representation of this framework is presented in
Figure 1
In the massive streaming data analytics model [11], we view the graph of network events as an unending
stream of new edge updates. For each interval of time, we have the static graph, which represents the
previous state of the network, and a sequence of edge updates that represent the events since the previous
*
corresponding author
Email addresses: james.fairbanks@gatech.edu (James Fairbanks), rkannan@gatech.edu (Ramakrishnan Kannan),
hpark@cc.gatech.edu (Haesun Park), bader@cc.gatech.edu (David A. Bader)
Submitted to Parallel Computing October 3, 2014
© 2015. This manuscript version is made available under the Elsevier user license
http://www.elsevier.com/open-access/userlicense/1.0/