Online Incremental Handling of SELECT Queries on Unsorted Dataset R. C. Anirudha 1 , Avinash N. Bukkittu, Remya Kannan and V. S. Ananthanarayana Department of Information Technology, National Institute of Technology Karnataka, Surathkal. e-mail: 1 rcanirudha@gmail.com Abstract. With the advent of Big Data, the size of data repositories is growing exponentially. Queries on such data takes extremely long time to execute. There is a need for an eﬃcient ordering of output that can show user-likeable output in a short period of time. In this paper we propose an eﬃcient database engine which provides more control to the user and also incrementally renders the output, the ordering of which is automated on the basis of usage history exclusive to the user, for aggregate and other simple SELECT queries. Analogous to the concept of locality of reference, the querying trends of the user also tend to become domain-centric which we refer to as Interest Centric Locality of Reference (ICLOR). Such querying trends of the user is captured in a priority queue which we refer to as rank list. Rank list provides an eﬃcient way to logically partition the tuples in the relation and enables access to them eﬃciently. We further optimize our engine by generating Instance Equivalent Queries to populate the rank list which addresses adhoc queries and accelerates the learning rate of the rank list. Keywords: Incremental rendering, Interest centric locality of reference, Instance equivalent queries, Query by output, Rank list. 1. Introduction Data analytics in the modern day has become more complex by using aggregate computation on large sets of data like online transactions,web content, etc. Traditionally, a querying system involved a query being passed, then consequently the query is processed over a large set of data and the output produced. When users pass a query, they are forced to wait without any feedback on what is going on in the back end. The system, however, is processing through millions of records to obtain the output to the query entered by the user. This has proven to be both computational and time expensive. With the advent of the age of the Big Data, the size of data repositories is growing at a large rate. However, modern applications expect the results to be produced at a real-time response rates. Many techniques have been proposed for this purpose like online aggregation that lets the users control the progress of the queries and the execution [1]. A large number of techniques proposed fast processing of a huge amount of data by compromising on the accuracy for the result speed [2]. Studies have been done on aggregated query to improve the user interaction with the back end of the system. However, limited studies on analytics on all types of SELECT queries are done. These SELECT queries produce ICDMW-2014 Editors: K. R. Venugopal, P. Deepa Shenoy and L. M. Patnaik pp. 27–35. 27