J Multimodal User Interfaces
DOI 10.1007/s12193-016-0223-x
ORIGINAL PAPER
Prioritizing foreground selection of natural chirp sounds by tempo
and spectral centroid
Francesco Tordini
1
· Albert S. Bregman
2
· Jeremy R. Cooperstock
1
Received: 11 November 2015 / Accepted: 17 May 2016
© SIP 2016
Abstract Salience shapes the involuntary perception of a
sound scene into foreground and background. Auditory inter-
faces, such as those used in continuous process monitoring,
rely on the prominence of those sounds that are perceived as
foreground. We propose to distinguish between the salience
of sound events and that of streams, and introduce a paradigm
to study the latter using repetitive patterns of natural chirps.
Since streams are the sound objects populating the auditory
scene, we suggest the use of global descriptors of percep-
tual dimensions to predict their salience, and hence, the
organization of the objects into foreground and background.
However, there are many possible independent features that
can be used to describe sounds. Based on the results of two
experiments, we suggest a parsimonious interpretation of the
rules guiding foreground formation: after loudness, tempo
and brightness are the dimensions that have higher priority.
Our data show that, under equal-loudness conditions, patterns
with fast tempo and lower brightness tend to emerge and that
the interaction between tempo and brightness in foreground
selection seems to increase with task difficulty. We propose
to use the relations we uncovered as the underpinnings for
a computational model of foreground selection, and also, as
design guidelines for stream-based sonification applications.
This research has been generously supported by the Networks of
Centres of Excellence: Graphics, Animation, and New Media
(GRAND), and by the Natural Sciences and Engineering Research
Council (NSERC) of Canada, Grant #203568-06.
B Francesco Tordini
tord@cim.mcgill.ca
1
Centre for Intelligent Machines, McGill University, 3480
University Street, Montreal, QC H3A 0E9, Canada
2
Department of Psychology, McGill University, 1205 Docteur
Penfield Avenue, Montreal, QC H3A 1B1, Canada
Keywords Auditory scene analysis · Salience · Feature
extraction · Sonification · Foreground selection · Natural
sounds
1 Introduction
The design of auditory displays, such as warning systems
and mobile assistive technologies, must deal with sonic
information design, management of attention, and salience.
Our long-term objective is to create a tool that assists in
sound scene design by predicting the perceived auditory
foreground. The salience of a sound can be defined as its
prominence relative to other sounds or, more generally, with
respect to a background. Even though the distinction between
salience and attention is debated, it is well accepted that
salience represents “bottom up” processes while attention
deals with “top down”, task-driven ones. Bottom-up mech-
anisms, including salience, shape the listener’s involuntary
organization of the sounds generating the scene [1, 2]. There-
fore, salience likely plays an important role in the design of
effective sonification strategies, that is, the use of non-speech
audio to present and represent information [3, 4]. To guide
such sonification strategies, it may be valuable to employ a
computational model that maps a set of acoustical features
to the perceived salience of a sound in a scene.
There are two important challenges to achieve such a
model. First, the lack of an adequate operational definition of
salience hinders the collection of perceptual data as ground
truth. Second, despite a possibly infinite set of acoustic and
perceptual features from which we might choose for use
in salience prediction, the literature does not offer concrete
guidance as to their relevance, apart from the obvious feature
of loudness.
In the present article, we make an initial effort to address
both of these challenges. We first introduce a distinction
123