Journal of Basic and Applied Engineering Research p-ISSN: 2350-0077; e-ISSN: 2350-0255; Volume 2, Issue 20; October-December, 2015, pp. 1756-1760 © Krishi Sanskriti Publications http://www.krishisanskriti.org/Publication.html Indexing of Voluminous Data: Its Needs and Challenges Minakshi Gogoi 1 and Jayashree Das 2 1 Department of Computer Science and Engineering Girijananda Choudhury Institute of management and Technology 2 M. Tech, 3rd Semester Department of Computer Science and Engineering Girijananda Choudhury Institute of management and Technology E-mail: 1 minakshi_cse@gimt-guwahati.ac.in, 2 jayashree.kri.das@gmail.com Abstract—In the present era, due to the technological advancement, the rate of growing of internet users are increasing rapidly with the exploration and expansion of huge volumes of databases. Simultaneously, the need for accessing numerous information in terms of various types of data like image, document, video has turned to be a vital part of most people’s day to day life. Indexing of image data for fast retrieval and pattern search with higher efficiency and accuracy has turned into a challenging task in the present information retrieval scenario. Image retrieval is enhanced due to the increase in data volume throughout the internet over the decades and also to fulfill the requirement of various applications such as individual authentication, face recognition, biometrics, pattern search, remote sensing etc. Spatial images that covers spectral and non-spectral image data can be indexed based on their content features such as color, texture, shape, spatial layout etc. The techniques used for indexing of spectral data are of particular interest in various application fields such as content based remote sensing, agriculture, astronomy, biomedical imaging etc. Spectral images are nothing but the images of the same object taken in different bands of the electromagnetic spectrum. A spectral image may refer as a hyperspectral or a multispectral image data. They are multidimensional in nature. Other spatial data which are not related to the spectral bands are non spectral data. Currently there are a number of techniques available for indexing and query processing of these types of data ( spectral and non spectral ) such as pyramid technique, K-d tree, Map Reduce, R-tree, R+ tree, score based method etc. In this paper a brief overview of the current techniques is provided and also their implementation in the indexing of spectral and non-spectral data. Their advantages and limitations are analyzed and their performance efficiency is compared. Keywords: spectral data, indexing, feature extraction, k-d tree, pyramid technique. 1. INTRODUCTION With the increase in database volume, fast indexing and retrieval techniques has been considered as demanding for the enhanced network and multimedia technologies. Earlier, managing large spatial databases was used in geosciences and computer aided design (CAD). Later, the newfound applications in computer vision and robotics, computer visualization, geographical information processing, automated mapping and facilities management etc has increased the need for fast and efficient indexing and retrieval of spatial data. To fulfill the requirement of indexing of images many techniques have evolved in the last few decades. One way is the traditional image database indexing and retrieval approach which is text based. Here the image data is fully converted into an electronic presentation [8]. But with the increase in popularity of the internet and enhancement in multimedia technologies, this approach is disliked by people due to some factors such as lower quality text and higher cost. Difficulties of traditional indexing approach has led to raise the interest level in enquiring and developing the techniques for retrieving images automatically by using content features such as color, shape and texture etc. The different existing techniques that can be used for indexing of spatial image data are K-d tree, MapReduce, R+-tree, score- based etc. A K-d tree (short for k-dimensional tree) is a space- partitioning data structure for organizing points in a k- dimensional space. K-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches). MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. 2. NEED OF INDEXING The best way to express any information is through visual implementation rather than textual. An image or a picture can provide the user with the relevant information he need. Now since the visual content in the World Wide Web (www) along with offices, enterprises, industries having their own servers, is increasing enormously with time, the need for handling this huge volume of resource has become very much important. Every day the internet is loaded with billions of gigabytes of data. And this rate is increasing rapidly day by day. The democratization of images as proper sources of information and education in various fields such as agriculture, economics, International Conference on Electronic Devices, Circuits, Applied Electronics and Communication Technology (EDCAECT 2015) ISBN: 978-93-85822-02-5 312