Continuous Subgraph Pattern Search over Graph
Streams
Changliang Wang and Lei Chen
Department of Computer Science and Engineering
Hong Kong University of Science and Technology
{sonicwcl, leichen}@cse.ust.hk
Abstract— Search over graph databases has attracted much
attention recently due to its usefulness in many fields, such
as the analysis of chemical compounds, intrusion detection in
network traffic data, and pattern matching over users’ visiting
logs. However, most of the existing work focuses on search over
static graph databases while in many real applications graphs
are changing over time.
In this paper we investigate a new problem on continuous
subgraph pattern search under the situation where multiple
target graphs are constantly changing in a stream style, namely
the subgraph pattern search over graph streams. Obviously the
proposed problem is a continuous join between query patterns
and graph streams where the join predicate is the existence of
subgraph isomorphism. Due to the NP-completeness of subgraph
isomorphism checking, to achieve the real time monitoring of the
existence of certain subgraph patterns, we would like to avoid
using subgraph isomorphism verification to find the exact query-
stream subgraph isomorphic pairs but to offer an approximate
answer that could report all probable pairs without missing
any of the actual answer pairs. In this paper we propose a
light-weight yet effective feature structure called Node-Neighbor
Tree to filter false candidate query-stream pairs. To reduce the
computational cost, we further project the feature structures into
a numerical vector space and conduct dominant relationship
checking in the projected space. We propose two methods to
efficiently check dominant relationships and substantiate our
methods with extensive experiments.
I. I NTRODUCTION
As one of the most popular data models, graph has been
used in various real applications such as social network
modeling and chemical compound analysis. Due to their
wide usages, many interesting graph problems are extensively
studied, for example, graph reachability [22], [21], subgraph
search [24], [17], [4], and keyword search in graphs [10], [7].
In fact, in many applications, graphs are often evolving
along the time in a stream fashion instead of remaining static.
For example, in a traffic network, the links between nodes
are changing over the time. Given another example, during
a chemical reaction, the structures of chemical compounds
often change along the reaction process. We can model these
evolving graphs as graph streams, i.e., a sequence of graphs
which grows indefinitely over time [18]. However, most of the
previous work assumes that graph data are rather static, which
raises challenges when applying to graph streams. Compared
to static graphs, graph streams not only inherit the complexity
of graphs but also possess their own characteristics: 1) graphs
are frequently updated, and 2) real time response is necessary.
In this paper, we study the problem of continuous subgraph
pattern search over graph streams. Subgraph search has been
used as an effective tool for finding useful substructures in
a graph database. For example, a bio-chemist can utilize
the subgraph search to analyze the functionality of newly
found chemical compounds; network security administrators
can conduct a pattern (subgraph) matching over the network
traffic data to detect possible malicious attacks. Formally,
subgraph search over a graph database D is defined as follows:
Given a query graph Q, we need to find all data graphs
G
i
∈ D, where G
i
contains the query Q, namely, Q is
subgraph isomorphic to G
i
.
Due to the NP-completeness of subgraph isomorphism
checking [5], most of the previous works on subgraph search
employ a filter-and-verify strategy to reduce the number of
isomorphism checking. Specifically, graphs in the database are
indexed by a set of distinguishing features, such as paths [17],
trees [28] and subgraphs [24], then during query processing,
the extracted features are first used to prune the graphs that do
not contain the query graph, and afterward the left candidate
graphs are verified by the subgraph isomorphism checking.
Unfortunately, we can not directly apply the previous meth-
ods for subgraph search over static graphs to graph streams
due to their unique characteristics. For example, gIndex [24]
needs to mine frequent subgraphs(features) at each timestamp,
which does not satisfy the real time response requirement of
graph streams. Given another example, GraphGrep [17] may
satisfy the real time response requirement, however, it only
uses paths to filter out candidates and many false positives
(i.e. not the actual results that are reported as positive) still
exist in the result after filtering.
Motivated by shortcomings of previous approaches and the
challenges raised by graph streams, we address the problem of
continuous subgraph search over graph streams, that is, given
a set of predefined query graphs (patterns), we continuously
monitor a set of graph streams and report the possible ap-
pearances of a set of subgraphs (patterns) in a set of graph
streams at each timestamp (The formal problem definition
is given in Section II). In this work, in order to satisfy the
real time response requirement of search over graph streams,
we focus on retrieving the possible appearance instead of
exact appearance of the subgraph. Here possible appearance
IEEE International Conference on Data Engineering
1084-4627/09 $25.00 © 2009 IEEE
DOI 10.1109/ICDE.2009.132
393
IEEE International Conference on Data Engineering
1084-4627/09 $25.00 © 2009 IEEE
DOI 10.1109/ICDE.2009.132
393
Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on October 14, 2009 at 21:43 from IEEE Xplore. Restrictions apply.