Proceedings ELPUB 2008 Conference on Electronic Publishing - Toronto, Canada - June 2008 45 The MPEG Query Format, a New Standard For Querying Digital Content. Usage in Scholarly Literature Search and Retrieval Ruben Tous 1 and Jaime Delgado 2 Departament d’Arquitectura de Computadors, Universitat Politècnica de Catalunya (UPC) Departament d’Arquitectura de Computadors Universitat Politècnica de Catalunya (UPC) Mòdul D6, Campus Nord C/ Jordi Girona, 1-3, E-08034 Barcelona, Spain e-mail: 1 rtous@ac.upc.edu; 2 jaime.delgado@ac.upc.edu Abstract The initiative of standardization of MPEG Query Format (MPQF) has refueled the research around the definition of a unified query language for digital content. The goal is to provide a standardized interface to multimedia document repositories, including but not limited to multimedia databases, documental databases, digital libraries, spatio-temporal databases and geographical information systems. The initiative is being led by MPEG (i.e. ISO/IEC JTC1/SC29/WG11). This paper presents MPQF as a new approach for retrieving multimedia document instances from very large document databases, and its particular application to scholarly literature search and retrieval. The paper also explores how MPQF can be used in combination with the Open Archives Initiative (OAI) to deploy advanced distributed search and retrieval services. Finally, the issue of rights preservation is discussed. Keywords: scholarly literature; search, framework; query format, MPQF; Open Archives Initiative; MPEG 1. Introduction During the last years, the technologies enabling search and retrieval of multimedia digital contents have gained importance due to the large amount of digitally stored multimedia documents. Therefore, members of the MPEG standardization committee (i.e. ISO/IEC JTC1/SC29/WG11) have developed a new standard, the MPEG Query Format (MPQF) [1, 2, 3], which provides a standardized interface to multimedia document repositories, including but not limited to multimedia databases, documental databases, digital libraries, spatio-temporal databases and geographical information systems. The MPEG Query Format offers a new and powerful alternative to the traditional scholarly communication model. MPQF provides scholarly repositories with the ability to extend access to their metadata and contents via a standard query interface, in the same way as Z39.50 [4], but making use of the newest XML querying tools (based in XPath 2.0 [5] and XQuery 1.0 [6]) in combination with a set of advanced multimedia information retrieval capabilities defined within MPEG. This would allow, for example, querying for journal papers by specifying constraints over their related XML metadata (which is not restricted to a particular format) in combination with similarity search, relevance feedback, query-by-keywords, query- by-example media (using an example image for retrieving papers with similar ones), etc. MPQF has been designed to unify the way digital materials are searched and retrieved. This has important implications in the near future, when scholarly users’ information needs will become more complex and will involve searches combining (in the input and the output) documents from different nature (e-prints, still images, audio transcripts, video files, etc.). Currently, several forums, like [7], are trying to identify the necessary steps that could be taken to improve