Preservation of Relational Databases Significant Properties in the Preservation of Relational Databases Ricardo André Pereira Freitas Engineering and Industrial Management Research and Development Centre CLEGI – Lusíada University Vila Nova de Famalicão, Portugal freitas@fam.ulusiada.pt José Carlos Ramalho CCTC – Computer Science and Technology Center Informatics Department – University of Minho Braga, Portugal jcr@di.uminho.pt Abstract — Relational Databases are the most frequent type of databases used by organizations worldwide and are the base of several information systems. As in all digital objects, and concerning the digital preservation of them, the significant properties (significant characteristics) must be defined so that adopted strategies can be evaluated. First, a neutral format (hardware and software independent) – DBML – was adopted to persue the goal of dematerialization and to achieve a standard format in the digital preservation of the relational databases data and structure. Currently, in this project, we walk further in the definition of the significant properties by considering the database semantics as an important characteristic that should also be preserved. For the representation of this higher level of abstraction we are going to use an ontology based approach. We extract the entity-relationship model from the database in order to represent it as an ontology. Keywords – Digital Preservation; Significant Properties; Significant Characteristics; Relational Databases; Ontology; OAIS; XML; Digital Objects. I. INTRODUCTION In the current paradigm of information society more than one hundred exabytes of data are already used to support our information systems [16]. The evolution of the hardware and software industry causes that progressively more of the intellectual and business information are stored in computer platforms. The main issue lies exactly within these platforms. If in the past there was no need of mediators to understand the analogical artifacts today, in order to understand digital objects, we depend on those mediators (computer platforms). In the eventual absence of appropriate mediators, who can guarantee the preservation of the digital artifacts? In other words, who has the responsibility to support the continuity of access to digital data [2]? Despite the concrete responsibilities and considering that there is no generic solution, several researchers and research projects aim to face this problem. Although digital information can be exactly preserved in its original form by only copying (preserving) the bits, the problem appears when we notice the very fast evolution of those platforms (hardware and software) where the bits can be transformed into something human intelligible [10]. Digital archives and digital libraries are complex structures that without the software and hardware – which they depend on – the human being, or others, will certainly be unable to experience or understand them [9]. Our work addresses this issue of Digital Preservation and focuses on a specific class of digital objects: Relational Databases [10]. Relational databases are a very important piece in the global context of digital information and therefore it is fundamental not to compromise its longevity (life cycle) and also its integrity, liability and authenticity [19]. These kinds of archives are especially important to organizations because they can justify their activities and characterize the organization itself. Current studies claim that 90% of the information produced in a daily basis is stored in a relational database. Currently, in this project, we aim to determine/establish the significant properties for the relational databases family of digital objects. First, in the following section, we intend to generally discuss the significant properties for preservation of digital objects and also mention the controversy surrounding the discrepancy of terms used in different ways by different authors [1] (significant properties or significant characteristics). In section 3 we analyze the relational databases class of objects; we should be able to completely characterize this type of digital objects so that one may choose what are the issues (the things) important/valid/necessary for preservation. Section 4 establishes the significant properties for relational databases digital preservation. We define a methodology that leads us to the identification of the properties necessary to ensure the preservation of these objects over time. The significant properties are addressed, individually and globally, over different levels of abstraction. At the end we will draw some conclusions, specify the future work to be done and also enumerate some questions that emerge from the research.