IJSRST162672 | Received : 03 Dec 2016 | Accepted : 12 Dec 2016 | November-December-2016 [(2)6 : 359-361]
© 2016 IJSRST | Volume 2 | Issue 6 | Print ISSN: 2395-6011 | Online ISSN: 2395-602X
Themed Section: Engineering and Technology
359
Top-K Dominating Queries On Incomplete Data : A Survey
Jilu Sajeev, Noorjahan V. A.
Department of Computer Science and Engineering, Ilahia College of Engineering & Technology, Muvattupuzha, India
ABSTRACT
Top-k dominating queries output the k objects that are dominating all other objects in a dataset. In most of the
existing systems the dataset is assumed as complete. But in practical examples the dataset may be incomplete due to
various reasons. In this paper a survey on various methods used to find the dominating objects from an incomplete
dataset.
Keywords : Top-K Query, Dominance Relation, Skyline, Bucketing
I. INTRODUCTION
Top-k dominating queries combine the advantages of
top-k queries and skyline queries. There are many works
based on top-k dominating queries on complete data.
But in real-time applications it is not necessary that the
datasets are complete. The incompleteness means that
some dimensions in the dataset are missing.
The reasons for incomplete dataset may be dataloss,
privacy preservation and so on. For example, consider
the object A from a dataset. The dimensions of A is (1, 7,
-, 4).There is 4 dimensions for the object given and the
dimension „–„ indicates a missing value.
When using this type of dataset it is difficult to find the
top-k objects because some dimensions are missing so
that they are incomparable with others. So it is important
that how to find dominating elements from the
incomplete dataset.
To output the dominating objects from a dataset first of
all we need to define the dominance relationship in an
incomplete dataset.
Definition :( dominance relationship on incomplete data
[1]). Given two objects o and o’ in a dataset S. o
dominates o’ (i.e., o < o’) if the following conditions
hold: I) for every dimension i,either o. [i] is less than
o’.[i] or at least one of them is missing. II) there is at
least one dimension j in which both o. [j] and o’. [j] are
observed and o.[j] is less than o’.[j]. Consider an
incomplete dataset given in fig 1, in which 4 objects are
given with 5 dimensions for each object. In object A1
third dimension value is missing and also in all other
objects we can see that some dimensions values are not
available. While checking the dominance relationship
between objects by the above definition first we need to
compare A1 with A2.For each dimensions available in
both A1 and A2, A2 dominates A1 so score of A2
becomes 1.
In this way comparing each objects with others we can
find the score of the entire dataset elements.
Figure 1. A Sample Dataset
But in case of large dataset it is not possible to compare
each elements become complex and time consuming. So
there may be simple and speedy methods to find the
dominant elements. This paper explains some previous
works done on this subject.
II. METHODS AND MATERIAL
Related Works
This section includes some details about previous works
related to Top-K dominating queries on incomplete data.