Selecting optimal instantiations of data models—Theory and validation of an ex ante approach P.L. Bowen a , R. Debreceny b , F.H. Rohde a, * , J. Basford a a University of Queensland, UQ Business School, 4072, Australia b University of Hawaii, College of Business Administration, Honolulu, 96822, USA Received 20 July 2004; received in revised form 7 October 2005; accepted 13 October 2005 Available online 4 January 2006 Abstract The schema of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. Obtaining quickly the appropriate data increases the likelihood that an organization will make good decisions and respond adeptly to challenges. This research presents and validates a methodology for evaluating, ex ante, the relative desirability of alternative instantiations of a model of data. In contrast to prior research, each instantiation is based on a different formal theory. This research theorizes that the instantiation that yields the lowest weighted average query complexity for a representative sample of information requests is the most desirable instantiation for end-user queries. The theory was validated by an experiment that compared end-user performance using an instantiation of a data structure based on the relational model of data with performance using the corresponding instantiation of the data structure based on the object-relational model of data. Complexity was measured using three different Halstead metrics: program length, difficulty, and effort. For a representative sample of queries, the average complexity using each instantiation was calculated. As theorized, end users querying the instantiation with the lower average complexity made fewer semantic errors, i.e., were more effective at composing queries. D 2005 Elsevier B.V. All rights reserved. Keywords: Models of data; Data representations; Object-relational databases; Relational databases; Query languages; Query complexity 1. Introduction Over the last decade many organizations have ex- panded their transaction processing databases, imple- mented enterprise resource planning (ERP) systems, and built enterprise-wide data warehouses. Maximizing their returns from these investments requires organiza- tions to make these data repositories available directly to knowledge workers for operational, tactical, and strategic decision making [48]. Knowledge workers can access these data repositories via a wide variety of end-user analytical tools including graphical query interfaces, report writers, OLAP cube builders, and data mining tools as well as the more traditional database query languages [44,46]. The value of the analyses made by knowledge work- ers depends on the quality of the information captured by and retrieved from the enterprise’s data repositories. Obviously, the accuracy of the data retrieved is affected by the accuracy of the data stored. One hundred percent accurate stored data does not, however, guarantee that 0167-9236/$ - see front matter D 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.dss.2005.10.002 * Corresponding author. Tel.: +61 7 3365 6530; fax: +61 7 3365 6788. E-mail address: f.rohde@business.uq.edu.au (F.H. Rohde). Decision Support Systems 42 (2006) 1170 – 1186 www.elsevier.com/locate/dsw