PRESERVATION DIGITIZATION OF DAVID EDELBERG’S HANDEL LP COLLECTION: A PILOT PROJECT Catherine Lai Beinan Li Ichiro Fujinaga lai@music.mcgill.ca Music Technology, Faculty of Music McGill University Montreal, Canada beinan.li@mail.mcgill.ca ich@music.mcgill.ca ABSTRACT This paper describes the digitization process for build- ing an online collection of LPs and the procedure for creating the ground-truth data essential for developing an automated metadata and content capturing system. Keywords: Digitization, Preservation, Analogue Sound Recordings, Use and Access, Digital Library Collections. 1 INTRODUCTION Long-playing phonograph records (LPs) were one of the major analogue recording formats distributed commercially throughout most of the twentieth cen- tury. Although most of these historic sound recordings have long shelf lives, compelling reasons have led to a shift toward digital preservation. To assure preventative preservation and facilitate new forms of access to this very important cultural heritage, a large digitization effort is required. An efficient and economical workflow management sys- tem is essential to carry out the steps in the digitiza- tion process. This digitization process is time- consuming and expensive since many steps involved in the digital conversion, such as metadata extraction, require much human intervention and a high-level musical and bibliographic knowledge. It is essential to minimize human intervention so as to reduce the cost of digitizing very large numbers of LPs. One way of achieving this is to integrate sophis- ticated pattern recognition systems to automatically generate text and metadata from the captured images. Another time-consuming task, if performed by a dedicated human digitization operator, is separating the music tracks that are on each side of audio discs. A plausible approach to automating track separation is to use digital signal classification techniques. Approximately thirty LPs from David Edelberg’s Handel collection were digitized as a pilot study. The LPs, housed in McGill University’s Marvin Duchow Music Library, are one of the largest collections of analogue recordings of Handel’s music. Much of the effort at this initial stage of the project was devoted to digital benchmarking for conversion and access and to creating ground-truth data that can be used to train and test content analysis systems, thereby automating the digitization process. 2 BACKGROUND Digital library projects focusing on audio preservation are still in the development stage. The Loeb Music Library Audio Preservation Studio of Harvard Univer- sity is currently examining the methodologies and technologies needed to access sound recordings and other digital objects [1]. The Digital Audio-Visual Preservation Prototyping Project of the Library of Congress (LC) is investigating approaches for refor- matting recorded sound and moving image collec- tions, with a focus on metadata [2]. The University of California at Santa Barbara is conducting a pilot pro- ject on cylinder preservation and digitization [3]. Other related research projects on sound recordings include Indiana’s Variations2 project [4], the digitiza- tion of 78rpm recordings at the Frontera Archives [5], and the Digital Audio project at the National Library of Canada [6]. The digital preservation of the Edelberg Handel collection is unique for several reasons. It deals with a large collection of LPs, involves digitization of both audio and visual components (album covers and liner notes), and involves benchmarking for conversion and access. It implements an integrated database with searchable full text, images of album covers and re- cord labels, and audio files of LPs. Furthermore, it develops automated content capture systems to reduce the cost of digital conversion whenever possible. 3 PREPARATION OF THE QUALITY CONTROL ENVIRONMENT Quality control (QC) is an essential and integral com- ponent in various stages of digitization. The quality of digital reproduction rests to a significant degree on the QC instruments and software [7]. The Handel digitiza- tion project uses state-of-the-art digitization equip- ment and software tools to reformat and reproduce analogue sound recordings. The multimedia digitiza- tion workstation consists of professional models of a record cleaning machine, turntable, and large-format flatbed scanner; a phono-preamplifier and A/D audio Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee pro- vided that copies are not made or distributed for profit or com- mercial advantage and that copies bear this notice and the full citation on the first page. © 2005 Queen Mary, University of London 570