P Jhansi Rani et al, International Journal of Computer Science and Mobile Computing, Vol.3 Issue.10, October- 2014, pg. 976-981
© 2014, IJCSMC All Rights Reserved 976
Available Online at www.ijcsmc.com
International Journal of Computer Science and Mobile Computing
A Monthly Journal of Computer Science and Information Technology
ISSN 2320–088X
IJCSMC, Vol. 3, Issue. 10, October 2014, pg.976 – 981
RESEARCH ARTICLE
An Efficient Indexing Method for Box
Queries in NDDS Spaces using BoND-tree
P Jhansi Rani
1
, GK Srikanth
2
M.TECH, Dept. Of CSE, AVNIET, JNTUH, HYDERABAD, AP
Associate Professor, Dept. Of CSE, AVNIET, JNTUH, HYDERABAD, AP
Abstract — Similarity searches in multidimensional Non-ordered Discrete Data Spaces (NDDS) are becoming increasingly
important for application areas such as bioinformatics, biometrics, data mining and E-commerce. Efficient similarity
searches require robust indexing techniques. Box queries (or window queries) are a type of query which specifies a set of
allowed values in each dimension. Unfortunately, existing indexing methods developed for multidimensional (ordered)
Continuous Data Spaces (CDS) such as the R-tree cannot be directly applied to an NDDS. Most of the existing work in this
field targets the similarity queries (range queries and k-NN queries). Other indexing methods based on metric spaces such as
the M-tree and the Slim-trees are too general to effectively utilize the special characteristics of NDDSs, resulting in non-
optimized performance. In this paper, we propose a new dynamic data partitioning- based indexing technique, called the
BoND-tree, to exploits exclusive properties of NDDS. Unique characteristics of the NDDS are exploited to develop new node
splitting heuristics. The BoND-tree and the Slim-trees for similarity searches in multidimensional NDDSs. For the BoND-
tree, we also provide theoretical analysis to show the optimality of the proposed heuristics. Extensive experiments with
synthetic data demonstrate that the proposed scheme is significantly more efficient than the existing ones when applied to
support box queries in NDDSs. We also show effectiveness of the proposed scheme in a real world application of primer
design for genome sequence databases.
Keywords: NDDS, Box Queries, Indexing Methods
I. INTRODUCTION
Box query in NDDS is an important type of query which is defined by specifying a set of allowed values in each dimension.
These queries are useful in many diverse applications such as bioinformatics, biometrics, data mining and E-commerce. Each
data item is viewed as a set of non-ordered discrete values rather than a vector (in this paper we use the terms „high -dimensional
categorical data‟ and „vectors in NDDS‟ interchangeably). There is an increasing demand for similarity searches in
multidimensional Non ordered Discrete Data Spaces (NDDS) from application areas such as bioinformatics, biometrics, data
mining and E-commerce. The main characteristic of such a data space is that the data values in each dimension are discrete and
have no ordering. Other examples of non-ordered discrete values in a dimension of an NDDS are discrete data types such as
gender, complexion, profession and user-defined enumerated types. In general, indexes are used to achieve improved response
time for query execution in large databases. In this paper we propose an effective indexing scheme for implementing box
queries in NDDS for large databases. There are many existing indexing schemes for large databases for continuous data spaces
(CDS). These indexing schemes are not suitable for queries in NDDS because of the fundamental differences between the two
spaces. The databases that require searching information in an NDDS can be very large (e.g., the well-known genome sequence
database, contains over 80 GB genomic data). To support efficient similarity searches in such databases, robust indexing
techniques are needed. Indexing techniques in the CDS rely on the fact that the indexed values can be ordered in each
dimension which is not the case in NDDS. However, NDDS has certain value discrimination properties which can be exploited
for efficient implementation of indexes in NDDS. The proposed work exploits these properties of NDDS to develop a new
indexing scheme, BoND-tree, targeted towards improving the performance of box queries.