The Orion Uncertain Data Management System Sarvjeet Singh, Chris Mayfield, Sagar Mittal, Sunil Prabhakar, Susanne Hambrusch, Rahul Shah Department of Computer Science, Purdue University West Lafayette, Indiana, USA sarvjeet, cmayfiel, smittal, sunil, seh, rahul@cs.purdue.edu Abstract Orion is a state-of-the-art uncertain database man- agement system that extends the relational model to include probabilistic uncertain data as first call data types. This demonstration presents an implementa- tion of this system as an extension of PostgreSQL. The Orion model is capable of supporting both at- tribute and tuple uncertainty with arbitrary correla- tions. Both discrete and continuous pdfs are handled in a natural and accurate manner. The system uses standard SQL with only minimal modifications. The undelying model is consistent with Possible Worlds Se- mantics. We demonstrate the working of Orion and how it simplifies the design and enhances the capabili- ties of two sample applications: managing sensor data (continuous uncertainty) and inferring missing values (discrete uncertainty). 1 Introduction Uncertainty is prevalent in numerous application do- mains, ranging from information extraction and in- tegration to scientific data management and sensor databases. Probabilistic and uncertain data manage- ment have recently received much attention in the database community (see [7] for related work). Orion 1 is a general-purpose uncertain DBMS that unifies the modeling of probabilistic data across applications. This in turn provides additional opportunities to the query engine for indexing and optimization. One motivating example is a data cleaning system that automatically detects and corrects errors. Since conventional database management systems assume data to be certain and precise, the software must either construct its own probabilistic model for the data, or simply pick one of the alternative values to store in the * Work done while at Purdue University. Current affiliation: Louisiana State University, Baton Rouge, Louisiana, USA. International Conference on Management of Data COMAD 2008, Mumbai, India, December 17–19, 2008 c Computer Society of India, 2008 1 See http://orion.cs.purdue.edu/ underlying database. Either option is unsatisfactory. The first significantly complicates queries and places a significant bruden on the user to implement various properties of uncertain data outside the DBMS. The second option can lead to substantial loss of informa- tion or accuracy of data. Orion provides an alternative solution: built-in support for uncertainty at the database level. By modifying the relational model to handle probabil- isic data and extending the query processing engine of PostgreSQL, Orion natively manages uncertain data modeled as arbitrary joint probability distributions. “Orion 2.0” is a complete redesign and rewrite of its predecessor “U-DBMS” [1], and includes the following new and innovative contributions: An integrated implementation (within Post- greSQL) of the “PDF Attributes” data model, which is consistent with Possible Worlds Seman- tics (PWS) and supports both continuous and dis- crete uncertainty [7]. Efficient access methods for querying uncertain data, including three index structures based on R- trees, signature trees, and inverted indexes [3, 5]. Improved query optimization, join algorithms, and selectivity estimation by gathering and ex- ploiting additional statistics over probabilistic data types [2, 6]. Integration with PL/R for graphical visualization of and statistical inference over uncertain data [4]. The fundamental difference between Orion and re- lated projects is indeed its support of attribute-level continuous uncertainty, which enables the system to represent probabilistic data in a natural and efficient manner. The rest of this paper is organized as follows. Sec- tion 2 gives a brief overview of this data model, and Section 3 summarizes some of the implementation is- sues. We describe the main features of our demonstra- tion in Section 4, and highlight areas of future work in Section 5.