SPARQL Query Optimization for Structural Indexed RDF Data Minh Duc Nguyen 1 , Min Su Lee 2 , Sangyoon Oh 3,* and Geoffrey C. Fox 4 1 Samsung Electronics, Suwon, South Korea 2 Computational Omics Lab, School of Informatics and Computing, Indiana University, Bloomington IN, U.S.A. 3 Department of Computer Engineering, Ajou University, Suwon, South Korea 4 Pervasive Technology Institute, Indiana University, Bloomington IN, U.S.A. duc27.nguyen@samsung.com, lee910@indiana.edu, syoh@ajou.ac.kr, gcf@indiana.edu ABSTRACT Resource description framework, RDF, is a standard language model for representing semantic data. As the concept of Semantic Web becomes more viable, the ability to retrieve and exchange semantic data will become increasingly more important. Efficient management of RDF data is one of the key research issues in Semantic Web; consequently, many RDF management systems have been proposed with data storage architectures and query processing algorithms for data retrieval. However, most of the proposed approaches require many join operations that result in the unnecessary processing of intermediate results for SPARQL queries. The additional processing becomes substantial as the RDF data volume is increased. In this paper, we propose an efficient structural index and a query optimizer to process queries without join operations. Empirical experimental results show that our proposed system outperforms conventional query processing approaches, such as Jena, up to 79% in terms of query processing time by reducing the volume of unnecessary intermediate results. Keywords: query optimization, RDF data management, SPARQL, structure index 1. Introduction As the Semantic Web becomes more viable, the ability to retrieve and exchange information through a Resource Description Framework [1], RDF, becomes increasingly important. This data format is currently receiving interest from both researchers as well as business enterprises. A functional Semantic Web will require efficient and effective methods to store and retrieve large volumes of data. However, managing large volumes of RDF data (up to billions of triples) is a challenging issue. The two main data management issues in Semantic Web [2] are as follows. The first issue is related to the improvement of performance, scalability and query processing to manage large volumes of RDF data. The second issue is associated with increasing RDF data interoperability to enhance and utilize Semantic Web information with optimized inference engines. To solve these issues, many RDF data management system have been proposed that include data storage architectures and query processing algorithms. Currently, researchers are primarily focusing on two perspectives to optimize RDF storage for query processing: relation-based and graph-based. From the relation-based perspective, RDF data is just a particular type of relational data and already known relational database technoques of storing, indexing and procesing queires are reused and customized for RDF data [3,4]. Graph based approaches [5] try to store RDF data without sacreficing its rich graph