P Jhansi Rani et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.10, October- 2014, pg. 976-981 © 2014, IJCSMC All Rights Reserved 976 Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IJCSMC, Vol. 3, Issue. 10, October 2014, pg.976 – 981 RESEARCH ARTICLE An Efficient Indexing Method for Box Queries in NDDS Spaces using BoND-tree P Jhansi Rani 1 , GK Srikanth 2 M.TECH, Dept. Of CSE, AVNIET, JNTUH, HYDERABAD, AP Associate Professor, Dept. Of CSE, AVNIET, JNTUH, HYDERABAD, AP Abstract — Similarity searches in multidimensional Non-ordered Discrete Data Spaces (NDDS) are becoming increasingly important for application areas such as bioinformatics, biometrics, data mining and E-commerce. Efficient similarity searches require robust indexing techniques. Box queries (or window queries) are a type of query which specifies a set of allowed values in each dimension. Unfortunately, existing indexing methods developed for multidimensional (ordered) Continuous Data Spaces (CDS) such as the R-tree cannot be directly applied to an NDDS. Most of the existing work in this field targets the similarity queries (range queries and k-NN queries). Other indexing methods based on metric spaces such as the M-tree and the Slim-trees are too general to effectively utilize the special characteristics of NDDSs, resulting in non- optimized performance. In this paper, we propose a new dynamic data partitioning- based indexing technique, called the BoND-tree, to exploits exclusive properties of NDDS. Unique characteristics of the NDDS are exploited to develop new node splitting heuristics. The BoND-tree and the Slim-trees for similarity searches in multidimensional NDDSs. For the BoND- tree, we also provide theoretical analysis to show the optimality of the proposed heuristics. Extensive experiments with synthetic data demonstrate that the proposed scheme is significantly more efficient than the existing ones when applied to support box queries in NDDSs. We also show effectiveness of the proposed scheme in a real world application of primer design for genome sequence databases. Keywords: NDDS, Box Queries, Indexing Methods I. INTRODUCTION Box query in NDDS is an important type of query which is defined by specifying a set of allowed values in each dimension. These queries are useful in many diverse applications such as bioinformatics, biometrics, data mining and E-commerce. Each data item is viewed as a set of non-ordered discrete values rather than a vector (in this paper we use the terms „high -dimensional categorical data‟ and „vectors in NDDS‟ interchangeably). There is an increasing demand for similarity searches in multidimensional Non ordered Discrete Data Spaces (NDDS) from application areas such as bioinformatics, biometrics, data mining and E-commerce. The main characteristic of such a data space is that the data values in each dimension are discrete and have no ordering. Other examples of non-ordered discrete values in a dimension of an NDDS are discrete data types such as gender, complexion, profession and user-defined enumerated types. In general, indexes are used to achieve improved response time for query execution in large databases. In this paper we propose an effective indexing scheme for implementing box queries in NDDS for large databases. There are many existing indexing schemes for large databases for continuous data spaces (CDS). These indexing schemes are not suitable for queries in NDDS because of the fundamental differences between the two spaces. The databases that require searching information in an NDDS can be very large (e.g., the well-known genome sequence database, contains over 80 GB genomic data). To support efficient similarity searches in such databases, robust indexing techniques are needed. Indexing techniques in the CDS rely on the fact that the indexed values can be ordered in each dimension which is not the case in NDDS. However, NDDS has certain value discrimination properties which can be exploited for efficient implementation of indexes in NDDS. The proposed work exploits these properties of NDDS to develop a new indexing scheme, BoND-tree, targeted towards improving the performance of box queries.