Computers & Geosciences 31 (2005) 780–785 The man who wasn’t there: The problem of partially missing data Stephen Henley à Resources Computing International Ltd, 185 Starkholmes Road, Matlock DE4 5JA, UK Received 8 September 2004; received in revised form 13 January 2005; accepted 13 January 2005 Abstract Existing commercial database management systems offer little or no functionality to handle the complexity of geoscience data—and other environmental science data—particularly in respect of missing and partially missing (incomplete or imprecise) data items. The emphasis of both the relational theorists (Codd, Date, and others) and the developers of database systems is on commercial applications where only rudimentary treatment of missing data is required, in the form of NULLs, and even these are not handled properly by the SQL language. r 2005 Elsevier Ltd. All rights reserved. Keywords: Database; RDBMS; Missing data; Null; SQL; Logic; Relational; Fuzzy logic Yesterday upon the stair, I met a man who wasn’t there He wasn’t there again today: I wish that man would go away. – Children’s nonsense rhyme 1. Introduction Although one of the earliest relational database management systems (G-EXEC—Jeffery and Gill, 1976a–c) was developed in the 1970s to support applications in the geosciences, in recent years there has been progressively more reliance on general-purpose relational systems developed for ‘business’ users. This has the unfortunate consequence that little or no thought has been given to the complexities of managing real scientific data, and the resulting mismatch causes problems which have rarely been recognised despite the potentially severe consequences for the integrity of scientific databases. 2. Database management systems and data models The closest that many geoscientists come (or want to come) to database management systems (DBMS) is the Microsoft Access that comes bundled with the Office suite, or a packaged ODBC-compliant system sitting underneath an applications software product. Yet effective management of their geological data is vital for all exploration and mining projects. From the 1970s onwards, database management has been and remains an intensely fought-over battlefield. The original protagonists were hierarchical and network DBMSs following international CODASYL standards, and relational systems following (more or less) the principles first articulated by Codd (1970). During the 1980s the relational systems came to dominate the marketplace, largely by default as the older ARTICLE IN PRESS www.elsevier.com/locate/cageo 0098-3004/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2005.01.010 à Tel.: +44 629 581454; fax: +44 1629 581471. E-mail address: stephen.henley@btconnect.com.