Abstract— This work presents a new algorithm to detect RDF graph isomorphism. Furthermore, we show that isomorphic RDF graphs issue can be solved in an efficient manner using graphs information. Algorithm efficiency resides in the detection of RDF subgraphs isomorphism. The theoretical complexity of algorithm is O(nlogn+ e G ), where n is the number of vertices and e G is the cost to check all edges. The proposed algorithm was compared with Iterative Vertex Classification algorithm in order to show the strengths of the new algorithm, called LPS (Longest Path Subgraph). Experimental tests show a much better performance of LPS algorithm. Index Terms— graph isomorphism, RDF graph, algorithms I. INTRODUCTION There is a growing need for data integration. For example, life sciences research demands the integration of diverse and heterogeneous data sets that originate from distinct communities of scientists in separate subfields [1]. The Semantic Web [2] is an effort to give semantics to web content in order to allow data integration. The Resource Description Framework (RDF) [3] is a metadata model and language that serves as a basis to the Semantic Web infrastructure. RDF allows data to be linked to and/or merged with other RDF data by semantic web applications, using the eXtendable Markup Languaje (XML) as an interchange syntax. XML is a simple, very flexible text format proposed as a standard for structured information interchange. Good references for the RDF model are [4] and [5]. 1.1 Problem Statement Although XML and RDF can be very related, Berners explains an important difference, if we have XML and RDF documents which represent the same object, there are in general a large number of ways in which the XML document can be mapped onto the logical RDF graph [6]. In this case, we can say that RDF model is a data model. So, there are a large number of ways in which the XML document can be mapped to a RDF data model. One might wonder if two documents written syntactically different, but with the same meaning should have the same RDF data model. The answer to this question is affirmative. It is, a data model can be represented syntactically in many ways. On the other hand, index structures and algorithms for querying distributed RDF repositories have been proposed. Moreover, these proposals promote the optimization of queries on RDF repositories [7]. Nonetheless, to optimize is necessary to have good algorithms in order to determine if two distributed sources represent the same data model. This problem is equivalent to determine if two RDF graphs are isomorphic [8]. Nevertheless, to determine if RDF graphs are isomorphic is not easy work. Several algorithms to solve the issue have been proposed (see [6][9]). However, these algorithms are neither efficient nor robust. 1.2 Contributions In this paper, we propose a new algorithm to detect RDF graph isomorphism. Furthermore, we apply the extra information that comes with the graphs to solve the isomorphism problem for RDF graphs efficiently. Algorithm efficiency resides on RDF subgraph isomorphism detection. We implemented our algorithm called Longest Path Subgraph (LPS) and the Iterative Vertex Classification (IVC) algorithm [8], and drew experimental results. The LPS algorithm makes a preprocessing of the RDF graphs in first place, in order to generate two new graphs with information on its vertices and edges. Later, two RDF subgraphs are obtained from the new graphs with information. Then, algorithm verifies if RDF subgraphs are isomorphic. If subgraphs are not isomorphic RDF graphs, then it is not necessary to check all vertices and edges of RDF graphs. In addition, our algorithm solves two latent problems in the RDF graph matching problem: the first, is about the existence of cycles in a graph, and the second problem consists about unnamed vertices. Moreover, we believe that this algorithm can be extended to other applications. Our algorithm in the worst case is O(nlogn+ e G ), where n is the number of vertices and e G is the cost to check all edges. In order to verify its performance, we compared it with IVC algorithm. Both algorithms were implemented in C, and run on Linux Red Hat 7.2 with Celeron 1.4 GHZ processor, 512 MB RAM and gcc v2.95 compiler version. Longest Path Subgraph: A Novel and Efficient Algorithm to Match RDF Graphs Claudio Gutiérrez-Soto Universidad del Bío-Bío cogutier@ubiobio.cl Pedro G. Campos Universidad del Bío-Bío pgcampos@ubiobio.cl Julio Águila Universidad de Magallanes julio.aguila@umag.cl 2008 Mexican International Conference on Computer Science 1550-4069/08 $25.00 © 2008 IEEE DOI 10.1109/ENC.2008.41 232