Towards User-friendly Audio Creation
Cécile Picard
Participating to the numediart
Research Program on Digital
Art Technologies, Belgium
ccl.picard@gmail.com
Christian Frisson
and Jean Vanderdonckt
Université catholique de
Louvain (UCL)
Louvain-la-Neuve, Belgium
<first.lastname>@uclouvain.be
Damien Tardieu
and Thierry Dutoit
Université de Mons, TCTS Lab
Mons, Belgium
<first.lastname>@umons.ac.be
ABSTRACT
This paper presents a new approach to sound composition
for soundtrack composers and sound designers. We propose
a tool for usable sound manipulation and composition that
targets sound variety and expressive rendering of the compo-
sition. We first automatically segment audio recordings into
atomic grains which are displayed on our navigation tool ac-
cording to signal properties. To perform the synthesis, the
user selects one recording as model for rhythmic pattern and
timbre evolution, and a set of audio grains. Our synthesis
system then processes the chosen sound material to create
new sound sequences based on onset detection on the record-
ing model and similarity measurements between the model
and the selected grains. With our method, we can create a
large variety of sound events such as those encountered in
virtual environments or other training simulations, but also
sound sequences that can be integrated in a music compo-
sition. We present a usability-minded interface that allows
to manipulate and tune sound sequences in an appropriate
way for sound design.
Categories and Subject Descriptors
I.6 [Information Interfaces and Presentation]:
Sound and Music Computing
General Terms
Algorithms, Design
Keywords
Interactive Sound Composing, Audio Analysis & Synthesis,
Content-based Audio Similarity, Multi-fidelity Prototyping
1. INTRODUCTION
Soundtrack composers and sound designers aim at cre-
ating auditory experiences [2]. In order to produce sound-
tracks for movies or video games, Foley artists mainly rely on
prerecorded sound material, or record it themselves. While
the use of prerecordings is easy to implement, the number
of samples in a database is often limited due to memory
constraints. Another possibility to generate such sounds is
sound synthesis.
A large variety of synthesis methods exist, but each of
them is usually more suited for a reduced range of sounds.
A very common technique for texture synthesis is the data
driven concatenative synthesis, also referred to as mosaic-
ing [11]. Concatenative synthesis approaches aim at gen-
erating a meaningful macroscopic waveform structure from
a large number of shorter waveforms. They typically use
databases of sound snippets, or grains, to create a given
target phrase. Unlike granular synthesis where no analysis
is performed on the audio units and where the unit size is
defined arbitrarily [10], concatenative synthesis selects the
audio units according to a set of audio descriptors. Phys-
ical modeling can be introduced to further refine granular
synthesis [5, 1]. A very important issue for applications
of granular synthesis to sound design is the control of the
synthesis process. Vocem, introduced by Lopez et al. [7],
is one of the first graphical interfaces for real-time granu-
lar synthesis, with high-quality audio output and very short
latencies. Parameters allow the user to easily control the
creation and the distribution of the grains. With MoSe-
vius, Lazier et al. [6] first attempt to apply unit selection
to real-time performance-oriented synthesis with direct and
intuitive controls based on descriptor values such as energy,
spectral flux or spectral centroid, as well as voicing and in-
strument name. For a more musical context, Misra et al. [8]
focus on a single framework that starts with recordings and
proposes a flexible environment for sonic sculpting in gen-
eral. Another class of control methods relies on a wise visu-
alization of the grains database in order to adequately select
them. In Catart, Schwarz proposes to display the grains in a
two-dimensional space according to descriptor values or out-
put of dimension reduction techniques such as multidimen-
sional scaling analysis or principal component analysis [11].
Following these ideas, we propose an approach that com-
bines hypermedia navigation and a synthesis process into
an adequate multimodal user interface for sound composi-
tion and design. Our specific contributions are:
- a method for automatic analysis of audio recordings,
extraction and classification of meaningful audio grains
as new database.
- a technique for automatic synthesis of coherent sound-
tracks based on the arrangement of audio grains in
time.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
AM’10, September 15–17, 2010, Piteå, Sweden.
Copyright © 2010 ACM 978-1-4503-0046-9/10/09…$10.00.