J Multimodal User Interfaces DOI 10.1007/s12193-016-0223-x ORIGINAL PAPER Prioritizing foreground selection of natural chirp sounds by tempo and spectral centroid Francesco Tordini 1 · Albert S. Bregman 2 · Jeremy R. Cooperstock 1 Received: 11 November 2015 / Accepted: 17 May 2016 © SIP 2016 Abstract Salience shapes the involuntary perception of a sound scene into foreground and background. Auditory inter- faces, such as those used in continuous process monitoring, rely on the prominence of those sounds that are perceived as foreground. We propose to distinguish between the salience of sound events and that of streams, and introduce a paradigm to study the latter using repetitive patterns of natural chirps. Since streams are the sound objects populating the auditory scene, we suggest the use of global descriptors of percep- tual dimensions to predict their salience, and hence, the organization of the objects into foreground and background. However, there are many possible independent features that can be used to describe sounds. Based on the results of two experiments, we suggest a parsimonious interpretation of the rules guiding foreground formation: after loudness, tempo and brightness are the dimensions that have higher priority. Our data show that, under equal-loudness conditions, patterns with fast tempo and lower brightness tend to emerge and that the interaction between tempo and brightness in foreground selection seems to increase with task difficulty. We propose to use the relations we uncovered as the underpinnings for a computational model of foreground selection, and also, as design guidelines for stream-based sonification applications. This research has been generously supported by the Networks of Centres of Excellence: Graphics, Animation, and New Media (GRAND), and by the Natural Sciences and Engineering Research Council (NSERC) of Canada, Grant #203568-06. B Francesco Tordini tord@cim.mcgill.ca 1 Centre for Intelligent Machines, McGill University, 3480 University Street, Montreal, QC H3A 0E9, Canada 2 Department of Psychology, McGill University, 1205 Docteur Penfield Avenue, Montreal, QC H3A 1B1, Canada Keywords Auditory scene analysis · Salience · Feature extraction · Sonification · Foreground selection · Natural sounds 1 Introduction The design of auditory displays, such as warning systems and mobile assistive technologies, must deal with sonic information design, management of attention, and salience. Our long-term objective is to create a tool that assists in sound scene design by predicting the perceived auditory foreground. The salience of a sound can be defined as its prominence relative to other sounds or, more generally, with respect to a background. Even though the distinction between salience and attention is debated, it is well accepted that salience represents “bottom up” processes while attention deals with “top down”, task-driven ones. Bottom-up mech- anisms, including salience, shape the listener’s involuntary organization of the sounds generating the scene [1, 2]. There- fore, salience likely plays an important role in the design of effective sonification strategies, that is, the use of non-speech audio to present and represent information [3, 4]. To guide such sonification strategies, it may be valuable to employ a computational model that maps a set of acoustical features to the perceived salience of a sound in a scene. There are two important challenges to achieve such a model. First, the lack of an adequate operational definition of salience hinders the collection of perceptual data as ground truth. Second, despite a possibly infinite set of acoustic and perceptual features from which we might choose for use in salience prediction, the literature does not offer concrete guidance as to their relevance, apart from the obvious feature of loudness. In the present article, we make an initial effort to address both of these challenges. We first introduce a distinction 123