Continuous Subgraph Pattern Search over Graph Streams Changliang Wang and Lei Chen Department of Computer Science and Engineering Hong Kong University of Science and Technology {sonicwcl, leichen}@cse.ust.hk Abstract— Search over graph databases has attracted much attention recently due to its usefulness in many fields, such as the analysis of chemical compounds, intrusion detection in network traffic data, and pattern matching over users’ visiting logs. However, most of the existing work focuses on search over static graph databases while in many real applications graphs are changing over time. In this paper we investigate a new problem on continuous subgraph pattern search under the situation where multiple target graphs are constantly changing in a stream style, namely the subgraph pattern search over graph streams. Obviously the proposed problem is a continuous join between query patterns and graph streams where the join predicate is the existence of subgraph isomorphism. Due to the NP-completeness of subgraph isomorphism checking, to achieve the real time monitoring of the existence of certain subgraph patterns, we would like to avoid using subgraph isomorphism verification to find the exact query- stream subgraph isomorphic pairs but to offer an approximate answer that could report all probable pairs without missing any of the actual answer pairs. In this paper we propose a light-weight yet effective feature structure called Node-Neighbor Tree to filter false candidate query-stream pairs. To reduce the computational cost, we further project the feature structures into a numerical vector space and conduct dominant relationship checking in the projected space. We propose two methods to efficiently check dominant relationships and substantiate our methods with extensive experiments. I. I NTRODUCTION As one of the most popular data models, graph has been used in various real applications such as social network modeling and chemical compound analysis. Due to their wide usages, many interesting graph problems are extensively studied, for example, graph reachability [22], [21], subgraph search [24], [17], [4], and keyword search in graphs [10], [7]. In fact, in many applications, graphs are often evolving along the time in a stream fashion instead of remaining static. For example, in a traffic network, the links between nodes are changing over the time. Given another example, during a chemical reaction, the structures of chemical compounds often change along the reaction process. We can model these evolving graphs as graph streams, i.e., a sequence of graphs which grows indefinitely over time [18]. However, most of the previous work assumes that graph data are rather static, which raises challenges when applying to graph streams. Compared to static graphs, graph streams not only inherit the complexity of graphs but also possess their own characteristics: 1) graphs are frequently updated, and 2) real time response is necessary. In this paper, we study the problem of continuous subgraph pattern search over graph streams. Subgraph search has been used as an effective tool for finding useful substructures in a graph database. For example, a bio-chemist can utilize the subgraph search to analyze the functionality of newly found chemical compounds; network security administrators can conduct a pattern (subgraph) matching over the network traffic data to detect possible malicious attacks. Formally, subgraph search over a graph database D is defined as follows: Given a query graph Q, we need to find all data graphs G i D, where G i contains the query Q, namely, Q is subgraph isomorphic to G i . Due to the NP-completeness of subgraph isomorphism checking [5], most of the previous works on subgraph search employ a filter-and-verify strategy to reduce the number of isomorphism checking. Specifically, graphs in the database are indexed by a set of distinguishing features, such as paths [17], trees [28] and subgraphs [24], then during query processing, the extracted features are first used to prune the graphs that do not contain the query graph, and afterward the left candidate graphs are verified by the subgraph isomorphism checking. Unfortunately, we can not directly apply the previous meth- ods for subgraph search over static graphs to graph streams due to their unique characteristics. For example, gIndex [24] needs to mine frequent subgraphs(features) at each timestamp, which does not satisfy the real time response requirement of graph streams. Given another example, GraphGrep [17] may satisfy the real time response requirement, however, it only uses paths to filter out candidates and many false positives (i.e. not the actual results that are reported as positive) still exist in the result after filtering. Motivated by shortcomings of previous approaches and the challenges raised by graph streams, we address the problem of continuous subgraph search over graph streams, that is, given a set of predefined query graphs (patterns), we continuously monitor a set of graph streams and report the possible ap- pearances of a set of subgraphs (patterns) in a set of graph streams at each timestamp (The formal problem definition is given in Section II). In this work, in order to satisfy the real time response requirement of search over graph streams, we focus on retrieving the possible appearance instead of exact appearance of the subgraph. Here possible appearance IEEE International Conference on Data Engineering 1084-4627/09 $25.00 © 2009 IEEE DOI 10.1109/ICDE.2009.132 393 IEEE International Conference on Data Engineering 1084-4627/09 $25.00 © 2009 IEEE DOI 10.1109/ICDE.2009.132 393 Authorized licensed use limited to: Hong Kong University of Science and Technology. Downloaded on October 14, 2009 at 21:43 from IEEE Xplore. Restrictions apply.