Proceedings: Cultural Heritage on line. Empowering users: an active role for user communities
© 2009 Fondazione Rinascimento Digitale
1
Digital Long-Term Preservation using a Layered Semantic
Metadata Schema of PREMIS 2.0
Sam Coppens, Erik Mannens, Tom Evens, Laurence Hauttekeete & Rik Van de Walle
Ghent University (Multimedia Lab / MICT) – IBBT
sam.coppens@ugent.be, erik.mannens@ugent.be, tom.evens@ugent.be,
laurence.hauttekeete@ugent.be & rik.vandewalle@ugent.be
Abstract
In Belgium, many institutions have a lot of information stored on analogue carriers. This
information is likely to get lost if no digitized copy of the information is stored for the long
term. Long-term preservation is subjected to many risks. Overcoming those risks starts with
describing the data thoroughly. The metadata needed for long-term preservation are
descriptive metadata to search and manage the whole archive, binary metadata to describe the
bitstreams, technical metadata describing the files, structural metadata for the representation
information, preservation metadata for keeping track of the provenance of the data, and rights
metadata. Therefore, we developed a layered semantic metadata schema. The top layer holds
the descriptive metadata, the bottom layer holds all the information necessary for long-term
preservation. The top layer consist of an OWL representation of Dublin Core, while for the
bottom layer we developed an OWL representation of the preservation standard PREMIS 2.0,
extended with a vocabulary defining the legal roles of a person, organization, or software.
This way, our model offers all the necessary metadata for long-term preservation.
Keywords: digital preservation, PREMIS 2.0, ontology, semantic web
1 Introduction
In Belgium, the broadcasters, cultural organizations, private persons, and government
institutions possess thousands of hours of speech and image material which is stored on
analogue carriers. This material belongs to the most important cultural heritage in Flanders.
At this moment, the analogue carriers are degrading and are continuously losing quality,
making the data inaccessible. Disseminating and storing the content digitally overcomes this
problem only temporarily. Furthermore, this digital content has to remain intact and
accessible over time, e.g., 20, 50 years or longer. Digital long-term preservation forms the
solution for this issue. The project BOM-Vl (Preservation and Disclosure of Multimedia Data
in Flanders, [1]) initiates the digital long-term preservation of the cultural heritage in Flanders
and researches the problems encountered with digital long-term preservation.
In this paper, we present our layered semantic metadata model. First, in chapter two, we
introduce the different kinds of metadata that are needed to overcome all the risks involved in
long-term preservation and show how our proposed, layered, semantic metadata model
relates to those risks. The semantic model consists of two layers: the top layer delivers the
descriptive metadata, and the bottom layer is responsible for the binary metadata, the
technical metadata, the structural metadata, the preservation metadata (provenance metadata,
fixity metadata, and context metadata), and the rights metadata. This way, all the metadata for
describing the content for the long-term, are covered by the layered semantic metadata
model. For the top layer, we use a Web Ontology Language (OWL, [2]) representation of