978-1-7281-6251-5/20/$31.00 ©2020 IEEE
Graph Filtering to Remove the “Middle Ground” for
Anomaly Detection
William Eberle
Department of Computer Science
Tennessee Tech University
Cookeville, TN, USA
weberle@tntech.edu
Lawrence Holder
School of Electrical Engineering
& Computer Science
Washington State University
Pullman, WA, USA
holder@wsu.edu
Abstract—Discovering patterns and anomalies in a variety of
voluminous data represented as a graph is challenging. Current
research has demonstrated success discovering graph patterns
using a sampling of the data, but there has been little work when
it comes to discovering anomalies that are based upon
understanding what is normative. In this work we present two
approaches to reducing graph data: subgraph filtering and graph
filtering. The idea behind the proposed algorithms is the removal
of a “murky middle”, where data that may not be normative or
anomalous, is removed from the discovery process. We empirically
validate the proposed approach on real-world, pseudo-real-world,
and synthetic data, as well as compare against a similar approach.
Keywords—graph-based anomaly detection, graph filtering,
knowledge discovery
I. INTRODUCTION
The ever-increasing volume, velocity and variety of data
continues to challenge our ability to extract knowledge from
data. In addition, interconnected relationships across data
sources introduce further representational and computational
challenges. Examples abound such as social networks,
biological networks, communication networks, and even brain
networks. The graph has emerged as an appropriate
representation for such data, and numerous methods have been
developed for extracting knowledge from networks. However,
since the type of data we seek to analyze is continually growing,
such as in telecommunications networks, we must be able to
handle the data in a computationally effective way in order to
extract relevant knowledge.
Current work in the area of pattern discovery and anomaly
detection in networks, or more specifically in graphs, has had to
deal with computational difficulties. Most have addressed this
issue by either only handling a sample of the graph [6][7][8][9],
transforming the graph into a smaller representation of its
structural properties [12][10], or reducing a visualization of the
graph [14][11]. In any case, they are not dealing with the task of
anomaly detection in the graphs while still reducing the
computational complexities.
In this work, we develop new methods for filtering graphs so
as to reduce the computational complexities without losing our
ability to discover relevant normative patterns and interesting
anomalous subgraphs. We present two approaches to address the
computational challenges: subgraph filtering, for reducing the
number of subgraphs at each iteration of edge extensions when
growing the list of candidate patterns; and, graph filtering, for
reducing the size of the input graph before searching for patterns
and anomalies. What makes this novel is that we are intelligently
filtering the graph rather than sampling. By excluding the
“middle” of the graph, or the “middle” candidate subgraphs, we
are able to retain the best normative patterns and the best
potential anomalous subgraphs – thereby, reducing the number
of graph matches by ignoring those subgraphs that should never
be normative or never be anomalous. We evaluate these methods
using actual, near real-time network data provided by one of the
leading managed security services providers at a major
telecommunications company, with input from their domain
experts.
II. RELATED WORK
Much work has been done on sampling and filtering graphs
in order to improve the efficiency of graph mining methods that
operate on high-volume data. Kashtan et al. [6] propose a
sampling method that randomly samples n nodes in order to
extract connected subgraph samples that are of order n. Their
approach consists of repeatedly sampling subgraphs of order n
until the desired sample size is reached. The concentration of
different subgraphs is then estimated and used to find motifs in
the network. Wernicke [7] modified the sampling scheme of
Kashtan et al. in order to correct the sampling bias. The TIES
algorithm [8] also uses a sampling scheme; however, the
objective of TIES is to sample a subgraph that is representative
of the entire graph. Ahmed et al. [9] propose the PIES algorithm
for sampling a representative subgraph from a large streaming
graph represented by a sequence of edges. PIES samples edges
randomly and then stores the nodes from the sampled edges. It
also performs stream sampling by maintaining a reservoir of
nodes. When the reservoir is full, it probabilistically decides to
drop old nodes and their incident edges from the sample and
include new nodes and edges.
However, in all of the above cases, the approaches are not
dealing with anomaly detection. Sampling is complicated in our
setting, because in the case of finding normative patterns, we
want to keep common structures and discard unique structures;
whereas, for anomaly detection, we want the opposite. This
suggests more of a filtering strategy that identifies “middle