IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. V, NO. N, MONTH YYYY 1 Musical Instrument Sound Morphing Guided by Perceptually Motivated Features Marcelo Caetano, Member, IEEE, Xavier Rodet Abstract—Sound morphing is a transformation that gradually blurs the distinction between the source and target sounds. For musical instrument sounds, the morph must operate across tim- bre dimensions to create the auditory illusion of hybrid musical instruments. The ultimate goal of sound morphing is to perform perceptually linear transitions, which requires an appropriate model to represent the sounds being morphed and an interpola- tion function to obtain intermediate sounds. Typically, morphing techniques directly interpolate the parameters of the sound model without considering the perceptual impact or evaluating the results. Perceptual evaluations are cumbersome and not always conclusive. In this work, we seek parameters of a sound model that favor linear variation of perceptually motivated temporal and spectral features used to guide the morph towards more perceptually linear results. The requirement of linear variation of feature values gives rise to objective evaluation criteria for sound morphing. We investigate several spectral envelope morphing techniques to determine which spectral representation renders the most linear transformation in the spectral shape feature domain. We found that interpolation of line spectral frequencies gives the most linear spectral envelope morphs. Analogously, we study temporal envelope morphing techniques and we concluded that interpolation of cepstral coefficients results in the most linear temporal envelope morph. Index Terms—musical instrument sounds, sound morphing, source-filter model. I. I NTRODUCTION S OUND morphing figures prominently among the sound transformation techniques studied in the literature due to its great creative potential and myriad possible outcomes. Sound morphing has been used in music compositions [1]– [3], sound synthesizers [4], and even in psychoacoustic ex- periments, notably to study timbre spaces [5]. However, there seems to be no consensus in the literature on which transfor- mations fall into the category of sound morphing and there certainly is no widely accepted definition of the morphing process for sounds. Most authors seem to agree that sound morphing involves the hybridization of two (or more) sounds by blending auditory features. One frequent requirement is that the result should fuse into a single percept, ruling out simply mixing or crossfading the sounds [4], [6] because the ear is capable of distinguishing them due to a number of cues and auditory processes. Still, many different sound transformations Manuscript received August 21, 2012; revised October 10, 2012. This work was performed at the Analysis/Synthesis team, IRCAM sup- ported by a Brazilian governmental CAPES grant (process 4082-05-2). M. Caetano is currently with the Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH-ICS), Heraklion, Crete, Greece e- mail: caetano@ics.forth.gr X. Rodet is Emeritus Researcher with the Analysis/Synthesis team, IRCAM. Copyright (c) 2013 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. Fig. 1: Depiction of image morphing to exemplify the aim of sound morphing. Found online at http://tinypic.com/images/404.gif, currently publicly available at http://paulbakaus.com/wp-content/uploads/2009/10/bush-obama-morphing. jpg. are described as morphing, such as interpolated timbres [4], smooth or seamless transitions between sounds [7] or cyclo- stationary morphs [8]. In a previous work [9], we thoroughly reviewed the different types of sound transformation that are usually termed morphing and evaluated how the temporal nature of the morphing transformation (stationary, dynamic, etc) directly interferes in the requirements of the process. When morphing musical instrument sounds, we usually want to transform across timbre dimensions to create the auditory illusion of hybrid musical instruments, gradually blur- ring the categorical distinction between the source and target sounds. Fig. 1 illustrates this effect for images. A challenging aspect of such transformations is to control the morph on the algorithmic and perceptual levels with a single coefficient α, called morphing or interpolation factor [9]. Ideally, we would like to obtain a morphed sound perceptually halfway between source and target when α = 1 / 2, and be able to recursively repeat the process for α = 1 / 2 n . Equivalently, linear variation of α should lead to a perceptually linear transformation. The concept of perceptual linearity in sound morphing lies at the core of this work, where we use perceptually motivated features to guide the transformation and evaluate linearity in the feature domain. We assume that linear variation in the feature domain indicates perceptual linearity when the features capture perceptually relevant information. Most morphing techniques proposed in the literature directly apply the interpolation principle without taking perceptual aspects into consideration [4], [6], [7], [10]–[15]. In this Fig. 2: Depiction of the classic morphing scheme using the interpolation prin- ciple, which assumes that perceptually intermediate representations possess intermediate parameter values.