RETHINKING DATA QUALITY AS AN OUTCOME OF CONCEPTUAL MODELING CHOICES Research Paper Roman Lukyanenko Memorial University of Newfoundland, St. John‟s Canada roman.lukyanenko@mun.ca Jeffrey Parsons Memorial University of Newfoundland, St. John‟s Canada jeffreyp@mun.ca Abstract: With the proliferation of unstructured data sources and the growing role of crowdsourcing, new data quality challenges are emerging. Traditional approaches that investigated quality in the context of structured relational databases viewed users as data consumers and quality as a product of an information system. Yet, as users increasingly become information producers, a reconceptualization of data quality is needed. This paper contributes by exploring data quality challenges arising in the era of user-supplied information and defines data quality as a function of conceptual modeling choices. The proposed approach can better inform the practice of crowdsourcing and can enable participants to contribute higher quality information with fewer constraints. Key Words: Data Quality, Information Quality, Database design, Conceptual modeling, Crowdsourcing, Citizen science. INTRODUCTION Data quality is an important concern for organizations, individuals and societies [29, 31]. The quality of data has a direct impact on the quality of decisions made based on that data. This paper attempts to account theoretically for the impact of data modeling activity on the quality of data in a database and introduce additional and potentially significant modeling considerations into data quality research. The motivation for the research comes from the proliferation of participative information systems, which pose unique challenges to traditional conceptual modeling and data quality approaches. Close examination of the nascent participative domains can lead to advances in data modeling and shed new light into the nature of data quality in general. Much of the existing research on data quality has focused on traditional, corporate use of databases, in which data is typically stored in a highly structured form. Studies have explored such dimensions as accuracy, completeness, consistency, and fitness for use [e.g. 5, 29, 30, 57]. Prior research viewed users as information consumers, and considered data quality to be a product of an information system. Yet, the distinction between information consumers and creators is rapidly disappearing. As users become information creators, and large data sets are increasingly being generated by amateur and inexperienced users (e.g. social networks and crowdsourcing projects), database structures make it difficult to accommodate discretionary and often unstructured information without having to constrain user input. Managing the quality of semi-structured and unstructured information is emerging as a new research challenge [34]. We argue it may be possible to address some of the emerging concerns by changing the way information is collected and stored. This paper presents a conceptual modeling approach to data quality that promises both theoretical and practical advantages. We claim that data quality is, to a large extent, a function of conceptual modeling choices. In particular, the choice to record data in terms of classes has significant data quality implications. Once defined, classes affect the degree to which an information system is able