Journal of Basic and Applied Engineering Research
p-ISSN: 2350-0077; e-ISSN: 2350-0255; Volume 2, Issue 20; October-December, 2015, pp. 1756-1760
© Krishi Sanskriti Publications
http://www.krishisanskriti.org/Publication.html
Indexing of Voluminous Data:
Its Needs and Challenges
Minakshi Gogoi
1
and Jayashree Das
2
1
Department of Computer Science and Engineering Girijananda Choudhury Institute of management and Technology
2
M. Tech, 3rd Semester Department of Computer Science and Engineering Girijananda Choudhury
Institute of management and Technology
E-mail:
1
minakshi_cse@gimt-guwahati.ac.in,
2
jayashree.kri.das@gmail.com
Abstract—In the present era, due to the technological advancement,
the rate of growing of internet users are increasing rapidly with the
exploration and expansion of huge volumes of databases.
Simultaneously, the need for accessing numerous information in
terms of various types of data like image, document, video has turned
to be a vital part of most people’s day to day life. Indexing of image
data for fast retrieval and pattern search with higher efficiency and
accuracy has turned into a challenging task in the present
information retrieval scenario. Image retrieval is enhanced due to the
increase in data volume throughout the internet over the decades and
also to fulfill the requirement of various applications such as
individual authentication, face recognition, biometrics, pattern
search, remote sensing etc. Spatial images that covers spectral and
non-spectral image data can be indexed based on their content
features such as color, texture, shape, spatial layout etc. The
techniques used for indexing of spectral data are of particular
interest in various application fields such as content based remote
sensing, agriculture, astronomy, biomedical imaging etc. Spectral
images are nothing but the images of the same object taken in
different bands of the electromagnetic spectrum. A spectral image
may refer as a hyperspectral or a multispectral image data. They are
multidimensional in nature. Other spatial data which are not related
to the spectral bands are non spectral data. Currently there are a
number of techniques available for indexing and query processing of
these types of data ( spectral and non spectral ) such as pyramid
technique, K-d tree, Map Reduce, R-tree, R+ tree, score based
method etc. In this paper a brief overview of the current techniques is
provided and also their implementation in the indexing of spectral
and non-spectral data. Their advantages and limitations are analyzed
and their performance efficiency is compared.
Keywords: spectral data, indexing, feature extraction, k-d tree,
pyramid technique.
1. INTRODUCTION
With the increase in database volume, fast indexing and
retrieval techniques has been considered as demanding for the
enhanced network and multimedia technologies. Earlier,
managing large spatial databases was used in geosciences and
computer aided design (CAD). Later, the newfound
applications in computer vision and robotics, computer
visualization, geographical information processing, automated
mapping and facilities management etc has increased the need
for fast and efficient indexing and retrieval of spatial data.
To fulfill the requirement of indexing of images many
techniques have evolved in the last few decades. One way is
the traditional image database indexing and retrieval approach
which is text based. Here the image data is fully converted
into an electronic presentation [8]. But with the increase in
popularity of the internet and enhancement in multimedia
technologies, this approach is disliked by people due to some
factors such as lower quality text and higher cost. Difficulties
of traditional indexing approach has led to raise the interest
level in enquiring and developing the techniques for retrieving
images automatically by using content features such as color,
shape and texture etc.
The different existing techniques that can be used for indexing
of spatial image data are K-d tree, MapReduce, R+-tree, score-
based etc. A K-d tree (short for k-dimensional tree) is a space-
partitioning data structure for organizing points in a k-
dimensional space. K-d trees are a useful data structure for
several applications, such as searches involving a
multidimensional search key (e.g. range searches and nearest
neighbor searches). MapReduce is a programming model and
an associated implementation for processing and generating
large data sets with a parallel, distributed algorithm on a
cluster.
2. NEED OF INDEXING
The best way to express any information is through visual
implementation rather than textual. An image or a picture can
provide the user with the relevant information he need. Now
since the visual content in the World Wide Web (www) along
with offices, enterprises, industries having their own servers,
is increasing enormously with time, the need for handling this
huge volume of resource has become very much important.
Every day the internet is loaded with billions of gigabytes of
data. And this rate is increasing rapidly day by day. The
democratization of images as proper sources of information
and education in various fields such as agriculture, economics,
International Conference on Electronic Devices, Circuits, Applied Electronics and Communication Technology
(EDCAECT 2015) ISBN: 978-93-85822-02-5 312