Towards User-friendly Audio Creation Cécile Picard Participating to the numediart Research Program on Digital Art Technologies, Belgium ccl.picard@gmail.com Christian Frisson and Jean Vanderdonckt Université catholique de Louvain (UCL) Louvain-la-Neuve, Belgium <ﬁrst.lastname>@uclouvain.be Damien Tardieu and Thierry Dutoit Université de Mons, TCTS Lab Mons, Belgium <ﬁrst.lastname>@umons.ac.be ABSTRACT This paper presents a new approach to sound composition for soundtrack composers and sound designers. We propose a tool for usable sound manipulation and composition that targets sound variety and expressive rendering of the compo- sition. We ﬁrst automatically segment audio recordings into atomic grains which are displayed on our navigation tool ac- cording to signal properties. To perform the synthesis, the user selects one recording as model for rhythmic pattern and timbre evolution, and a set of audio grains. Our synthesis system then processes the chosen sound material to create new sound sequences based on onset detection on the record- ing model and similarity measurements between the model and the selected grains. With our method, we can create a large variety of sound events such as those encountered in virtual environments or other training simulations, but also sound sequences that can be integrated in a music compo- sition. We present a usability-minded interface that allows to manipulate and tune sound sequences in an appropriate way for sound design. Categories and Subject Descriptors I.6 [Information Interfaces and Presentation]: Sound and Music Computing General Terms Algorithms, Design Keywords Interactive Sound Composing, Audio Analysis & Synthesis, Content-based Audio Similarity, Multi-ﬁdelity Prototyping 1. INTRODUCTION Soundtrack composers and sound designers aim at cre- ating auditory experiences [2]. In order to produce sound- tracks for movies or video games, Foley artists mainly rely on prerecorded sound material, or record it themselves. While the use of prerecordings is easy to implement, the number of samples in a database is often limited due to memory constraints. Another possibility to generate such sounds is sound synthesis. A large variety of synthesis methods exist, but each of them is usually more suited for a reduced range of sounds. A very common technique for texture synthesis is the data driven concatenative synthesis, also referred to as mosaic- ing [11]. Concatenative synthesis approaches aim at gen- erating a meaningful macroscopic waveform structure from a large number of shorter waveforms. They typically use databases of sound snippets, or grains, to create a given target phrase. Unlike granular synthesis where no analysis is performed on the audio units and where the unit size is deﬁned arbitrarily [10], concatenative synthesis selects the audio units according to a set of audio descriptors. Phys- ical modeling can be introduced to further reﬁne granular synthesis [5, 1]. A very important issue for applications of granular synthesis to sound design is the control of the synthesis process. Vocem, introduced by Lopez et al. [7], is one of the ﬁrst graphical interfaces for real-time granu- lar synthesis, with high-quality audio output and very short latencies. Parameters allow the user to easily control the creation and the distribution of the grains. With MoSe- vius, Lazier et al. [6] ﬁrst attempt to apply unit selection to real-time performance-oriented synthesis with direct and intuitive controls based on descriptor values such as energy, spectral ﬂux or spectral centroid, as well as voicing and in- strument name. For a more musical context, Misra et al. [8] focus on a single framework that starts with recordings and proposes a ﬂexible environment for sonic sculpting in gen- eral. Another class of control methods relies on a wise visu- alization of the grains database in order to adequately select them. In Catart, Schwarz proposes to display the grains in a two-dimensional space according to descriptor values or out- put of dimension reduction techniques such as multidimen- sional scaling analysis or principal component analysis [11]. Following these ideas, we propose an approach that com- bines hypermedia navigation and a synthesis process into an adequate multimodal user interface for sound composi- tion and design. Our speciﬁc contributions are: - a method for automatic analysis of audio recordings, extraction and classiﬁcation of meaningful audio grains as new database. - a technique for automatic synthesis of coherent sound- tracks based on the arrangement of audio grains in time. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. AM’10, September 15–17, 2010, Piteå, Sweden. Copyright © 2010 ACM 978-1-4503-0046-9/10/09…$10.00.