Querying Databases with Incomplete CP-nets Paolo Ciaccia DEIS, University of Bologna, Italy pciaccia@deis.unibo.it ABSTRACT Preference queries aim to retrieve from large databases (DB’s) those objects that better match user’s requirements. With the aim of sup- porting modern DB applications, such as context-aware ones, in which conditional preferences are the rule, in this paper we inves- tigate the possibility of adopting conditional preference networks (CP-nets) for DB querying. To this end, we also consider the rele- vant case in which CP-nets are not completely specified, a likely case for complex DB scenarios. We first show that the ceteris paribus (all else being equal) semantics, commonly associated with CP-nets, can lead to counterintuitive results if the CP-net is incom- plete and the DB is incomplete as well. Then, we introduce a new totalitarian (i.e., not ceteris paribus) semantics and, rather surpris- ingly, prove that our semantics is equivalent to ceteris paribus for complete acyclic CP-nets and that yields the same optimal results if the DB is complete. Finally, we show that when both the CP-net and the underlying DB are incomplete the totalitarian semantics can lead to more accurate and intuitive results. 1. INTRODUCTION The trend towards the personalization of information systems functionalities requires new models and techniques able to provide users with the “right information” at the “right time” in the “right place”. Context-aware applications are a remarkable step towards achieving this goal, the key idea being that of taking into account context information when processing user requests. In particular, ranking the result of a query should be based on the current user context, rather than on some absolute criterion. Example 1 Consider the following database of hotels: Name Price Stars Rooms Internet Jolly 40 2 50 Yes Continental 55 2 30 No Excelsior 80 3 50 Yes Rome 80 5 100 Yes Holiday 60 4 20 No When travelling for work, the user does not care about price and number of rooms, she preferring hotels with at least 4 stars and an Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, to post on servers or to redistribute to lists, requires a fee and/or special permission from the publisher, ACM. VLDB ‘07, September 23-28, 2007, Vienna, Austria. Copyright 2007 VLDB Endowment, ACM 978-1-59593-649-3/07/09. Internet connection. In this case the (only) best alternative is hotel Rome (5 stars and network-connected). However, if travelling for leisure, the user prefers small hotels (30 rooms) and whose price is at most 50 Euro. In this case no hotel satisfies both requirements, yet it can be argued that Continental, Jolly, and Holiday are the best available alternatives, since each of them satisfies one of the two user preferences. Frameworks proposed so far in the DB field [5, 10] have paid no specific attention to conditional preferences (see Section 5 for more details). On the other hand, these have been largely investigated by AI researchers, with a particular emphasis on CP-nets (Conditional Preference networks), see [2, 1, 14, 9], a graph-based formalism able to “factorize” the specification of preference statements over a set of attributes. A CP-net statement like ϕ = p : a i >a j , where ai and aj are values of attribute A and p is the value of other attributes P , is given a ceteris paribus interpretation, i.e., “given p prefer a i to a j only if values of other attributes are equal”. In order to use CP-nets for the purpose of DB querying, two ma- jor issues need to be addressed. First, since CP-nets are defined only for finite attribute domains, an extension to infinite domains, which are common in DB applications, is needed [6]. Second, the case in which the CP-net is not completely specified has to be con- sidered. This is to cover the likely case in which there are many attributes, possibly with large domains, and the user only states a limited set of preferences. In this paper we concentrate on this second issue and show that the ceteris paribus semantics yields counterintuitive results when the CP-net is incomplete and the DB is incomplete as well, i.e., it does not contain all the possible alternatives for the preference attributes. With the aim of preserving the strong points of CP- nets, in particular their capability of concisely representing con- ditional preferences, we study an alternative, so called totalitarian, semantics for CP-nets. 1 Our major formal result shows that, rather surprisingly, the new semantics is equivalent to ceteris paribus for complete acyclic CP-nets. Then we prove that for complete DB’s the two semantics, although leading to different preferences, al- ways yield the same set of optimal results. Finally, we consider the case of incomplete DB’s and CP-nets and argue that the total- itarian semantics is better suited to exclude from the result those tuples that are apparently sub-optimal with respect to user prefer- ences. Finally, we discuss possible extensions of the totalitarian semantics. 1 In the literature the “CP” acronym is sometimes used to stand for “ceteris paribus” rather than for “conditional preference”. In this paper we adhere to the original interpretation [2], thus we find no contradiction in defining a totalitarian semantics for CP-nets.