Data Description and Archives for Scientific Research in the Future Imaging & Media Lab University of Basel Simon Margulies, Ivan Subotic, Dr. Lukas Rosenthaler [simon.margulies, ivan.subotic, lukas.rosenthaler]@unibas.ch Bernoullistrasse 32, CH-4056 Basel/Switzerland Phone +41 61 267 04 88, Fax +41 61 267 04 85 1. Introduction At the last IS&T archiving conference 2005 the case has been made, that historical research of the future would change due to the growing availability of online resources of digital cultural heritage. 1 Scientific research depends widely on a controlled tradition. Archives guarantee the latter by maintaining and providing information objects for the scientific research of the future. Universal access, independent of time or space, has been made possible by the interconnection of data collections by modern information technologies. The simplified access renders possible enquiries of much more source material, which has changed and will change the processes of scientific research. Archives provide their primary data with various layers of metadata to guarantee the findability, readability and scientific interpretation of digital information objects. Mostly produced in XML this kind of data description makes a human- and machine-readable structuring of information objects possible. The underlying semantics of the structured description and thus the context of different information objects remain hidden to the machine. It can only be interpreted and used for further researches by humans conducting painstaking enquiries. The following paper wants to point out the shared connections between data structure, data semantics, archiving and scientific research. Techniques will be presented, that provide archives with new possibilities and can help scientific research to handle the growing amount of source material. 2. Research in digital data collections An archive is defined as an institution, which administrates and preserves an amount of documents (Archivgut) important to the historical coverage of the past of its sponsorship or a certain theme of the institution. For the future a growing interconnectedness between archives providing online access to digital databases is assumed. 2 Scientific research, especially historical research, depends strongly on context and 1 Clifford Lynch. Archiving, Stewardship, Curation: From the Personal to the Global Sphere. 2 E.g. by distributed archiving systems [1]. linking between different source materials - also being held in different archives - and tries to derive and prove these contexts and links: Such contexts are structured with a metadata-schema and described with the aid of a thesaurus by the archivist. The schema and its contents vary between different times, cultures, archives and people depending on their contexts and undertaken classifications. Future schemas will vary even more with a growing temporal and cultural distance between editors; already nowadays an agreement on a specific standard is unthinkable. 3 If the data description, as it is common practice, is composed in XML, only keyword searching in the schema-specific digital data collection can be supported. Contexts among different source materials with different data descriptions and in different data collections can only be discovered by the human researcher and not by a machine or a software agent: Data gets highly structured by XML, but the underlying semantic entities of single parts and especially their context to other information objects remain hidden to the software agents. If a continuous growing amount of accessible source material is presumed, scientific research will become more 3 In this regard Dublin Core [2] embodies an example of the greatest possible common denominator. It remains unquestioned to small for an adequate data description for preservation of digital source material of all kind.