International Journal of Computer Applications (0975 – 8887) Volume 140 – No.12, April 2016 37 Location Aware Indexing Yagnesh Kamble Computer Engineering department Sardar Patel Institute of Technology Mumbai, India Shubham Godshalwar Computer Engineering department Sardar Patel Institute of Technology Mumbai, India Ashna Bajaj Computer Engineering department Sardar Patel Institute of Technology Mumbai, India Reeta Koshy Professor, Computer Engineering department Sardar Patel Institute of Technology Mumbai, India ABSTRACT This project closely models a framework to process Generic Location-Aware Rank Queries. A restaurant-finder application has been created to demonstrate how a Generic Location-Aware Ranked Query (GLRQ) can be processed by deploying three data structures in sync with each other – the synopses tree, the R-tree and inverted files. The synopses tree, created using histograms, handles the numeric attributes. The R-tree filters results based on their location, while the inverted files filter according to specified keywords (eg: lunch, breakfast, italian, karaoke), if any. Existing methods of processing such queries perform the pruning of the search space in two stages – first according to location and keyword, and then according to specified predicates (or vice versa), which is usually not efficient. The method used here trumps the aforementioned because the pruning is carried out simultaneously. This is reasonably faster, especially when working with large datasets, which has been experimentally demonstrated. General Terms Query processing, spatial indexing Keywords Location-aware, synopses tree, IR-tree 1. INTRODUCTION There has been a massive increase in the usage of locations all over the Internet, especially on social networking platforms like Facebook, Twitter, Instagram, etc, which allow users to geo-tag their posts, as has been discussed in [1]. As a result, it is essential to have an efficient searching method that enables the user to search across a variety of filters such as location ('Search places near me'), numerical attributes ('Rating > 4', 'No. of comments > 100'), and keywords ('Search posts tagged with Street Food'). This framework was created because existing methods do not optimally process GLRQs, which are queries that contain numerical predicates, keywords, as well as location specifications. Since we have developed a restaurant-finder application, related examples have been used to further emphasize the inadequacy of the existing methods. In the naive LKQ (Location-Keyword Query) approach[2], the query is assumed to be location-aware and contain only keywords. Thus, constraints on numerical attributes are also converted to keywords, as is illustrated in the following example. Consider a query that searches for all nearby restaurants having a rating greater than 3.5 (out of 5). Over here, the predicate 'rating>3.5' is converted to rating=3.5, 3.6, 3.7, 3.8, 3.9 and so on, where each value of rating is treated as a keyword. This greatly complicates the query. The queries can also be processed using the LKQ-first or Predicate-first method. In this method, the data is first filtered first according to numerical attributes, then according to the location and keyword, or vice versa as the case may be. This is especially inefficient if the latter filtering process yields results which are a very small subset of the former. In Predicate-first, the numerical attributes are first looked at. Suppose they return 250 results. Now, after location-keyword filtering on those 250, only 3 of them satisfy the constraints. Thus, the remaining 247 results were unnecessarily fetched. Our framework, however, performs simultaneous pruning of predicates as well as locations and keywords. This is done by using three primary data structures: the synopses tree, the R- tree and inverted files in conjunction with each other. 2. LITERATURE SURVEY 2.1 LINQ - A Framework For Location- Aware Indexing And Query Processing The crux of the system design is derived from this paper[1]. It gives a method of evaluating generic location-aware rank queries (GLRQs). If these queries are evaluated in the usual method of location-keyword queries (LKQs), they prove to be very inefficient. The method proposed in this paper makes use of a data structure called the synopses tree, along with the R- tree and inverted files data structures. The synopses tree greatly reduces the cost of pruning, while ensuring that accuracy is preserved. These data structures are used to perform score-based pruning and predicate-based pruning simultaneously in a manner which is much faster than performing them in separate stages. This method has been tested on a number of real and synthetic data sets to prove the efficiency over the normal method adopted by an RDBMS. 2.2 Efficient Retrieval Of The Top-K Most Relevant Spatial Web Objects This paper[2] introduces us to the concept of IR-trees, and how they can be used to query objects that have two parameters – a location, and a set of keywords associated with the object. This framework integrates the inverted file for text retrieval and the R-tree for spatial proximity querying to obtain an inverted file R-tree. Each node of the IR-tree records a summary of the location information and textual content of all the objects in the sub- tree rooted at that node. The query-processing algorithms that uses the location index information to estimate the spatial distance of a query to the objects in the node’s sub-tree, and it uses the text index to estimate the text relevance for the objects.