10th International Society for Music Information Retrieval Conference (ISMIR 2009) OPTICAL AUDIO RECONSTRUCTION FOR STEREO PHONOGRAPH RECORDS USING WHITE LIGHT INTERFEROMETRY Beinan Li Jordan B. L. Smith Ichiro Fujinaga Music Technology Area Schulich School of Music McGill University beinan.li@ mail.mcgill.ca Music Technology Area Schulich School of Music McGill University jordan.smith2@ mail.mcgill.ca Music Technology Area Schulich School of Music McGill University ich@music.mcgill.ca ABSTRACT Our work focuses on optically reconstructing the stereo audio signal of a 33 rpm long-playing (LP) record using a white-light interferometry-based approach. Previously, a theoretical framework was presented, alongside the primitive reconstruction result from a few cycles of a stereo sinusoidal test signal. To reconstruct an audible duration of a longer stereo signal requires tackling new problems, such as disc warping, image alignment, and eliminating the effects of noise and broken grooves. This paper proposes solutions to these problems, and presents the complete workflow of our Optical Audio Recon- struction (OAR) system. 1. INTRODUCTION OAR has proven to be an effective contactless approach to digitizing monophonic phonograph records [1] [2] [3] [4]. Furthermore, it is an available solution for restoring broken records. Li et al. previously presented a theoretical framework for optically reconstructing audio with a white-light interferometry (WLI) microscope and image processing [5]. A few cycles of stereo sinusoidal signal, extracted from a small number of images, illustrated that their approach is capable of extracting stereo signals from LPs. To reconstruct a few seconds of audio, however, the scanning region must be scaled up to a much larger disc area, resulting in thousands of images. A sophisticated image acquisition and post-capture processing workflow is thus desired to tackle the challenges that emerge from large-scale scanning: e.g., disc surface warping, image alignment errors, groove damages, and unwrapping the grooves into a one- dimensional audio signal. In Section 2, we review previous OAR systems. Our system to acquire record groove images is introduced in Section 3, followed in Section 4 by our image processing procedures for extracting audio from the scanned images. The reconstructed result is illustrated and discussed in Section 5. 2. EXISTING OAR APPROACHES In this section, four previous OAR approaches are described. Although they operate on recordings of different formats, most OAR frameworks follow the same high-level three-step procedure for reconstructing an audio recording: first, the grooves are scanned; second, the groove undulations are isolated and extracted; third, these undulations are converted into audio. Approaches vary significantly in terms of the hardware used, some using a general-purpose commercial product such as a confocal microscope, others using a custom installation. The hardware, in turn, affects how the grooves are scanned and thus how groove undulations must be extracted. By contrast, the audio conversion step (which may include post-processing, such as equalization) depends solely on the record production procedures that were used for the particular item being scanned. This step almost always includes filtering the signal to undo the RIAA equalization used in production and obtain the audio. The systems developed by Iwai et al. and Nakamura et al. use a ray-tracing method to obtain the groove contour of a phonograph record [6] [7] [8]. The groove is illuminated with a laser beam, and the groove undulations are measured by detecting the angle at which the beam is reflected. In this way the laser functions as a simulated stylus—a replacement for the mechanical stylus—and can output an analog audio signal directly. Unfortunately, since such systems must trace out the grooves, they are unable to handle broken records. In addition, two types of errors limit this approach: the errors caused by the finite laser beam width, which leads to echoes and high- and low-frequency noise in the extracted audio signals, and the tracking errors that may occur when the beam misses the groove entirely. Fadeyev and Haber built an OAR system for 78-rpm records based on confocal laser scanning microscopy [1]. With the help of a low-mass probe, they built another one for wax cylinders [2]. Their system is capable of scanning the record in 3D with a vertical accuracy of around 4.0 microns. However, in their work on 78-rpm records only 2D imaging is emphasized, at a resolution of 0.26 x 0.29 microns per pixel. It takes their system 50 minutes to scan about 1 second of recorded audio, corresponding to 0.5–5 GB of image data. The groove bottom is obtained using 2D edge detection on the pixel illumination data, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. © 2009 International Society for Music Information Retrieval 627