Biologically-inspired neural coding of sound onset for a musical sound classification task Michael J. Newton and Leslie S. Smith Abstract—A biologically-inspired neural coding scheme for the early auditory system is outlined. The cochlea response is simulated with a passive gammatone filterbank. The output of each bandpass filter is spike-encoded using a zero-crossing based method over a range of sensitivity levels. The scheme is inspired by the highly parallellised nature of the auditory nerve innervation within the cochlea. A key aspect of early auditory processing is simulated, namely that of onset detection, using leaky integrate-and-fire neuron models. Finally, a time-domain neural network (the echo state network) is used to tackle the what task of auditory perception using the output of the onset detection neurons alone. I. I NTRODUCTION The mammalian auditory system performs a diverse range of signal processing tasks in near real time. Presented with a raw sound field, analysis is carried out to extract meaningful features, which may or may not be buried along with contributions from other sound sources. Such useful features include the direction from which a particular sound arrived (the where task)[1], [2], the nature of a individual sound (the what task)[3], interpreting the meaning of the sound (as in speech perception)[4] and decomposing a many-source sound field into seperable audio streams[5], [6]. In many cases several of these tasks must be performed at the same time. The processing of sound within the auditory system is highly integrated, involving neural processes at all levels, from the cochlea to the cortex. The system is two-way, with information passed both upwards to the cortex[7], and back downwards towards the sensory units through the efferent system[8], [9]. A key feature is that certain kinds of processing occur early on, even in advance of the brain stem[10]. In this work a biologically-inspired scheme for sound onset representation within the auditory system is investigated. There is strong evidence to suggest that mammalian auditory systems are particularly attuned to the detection of sound onsets, even from the earliest stages of the auditory process- ing chain[11], [12]. The auditory nerve itself is known to respond more strongly to the start of a stimulus, and there are neurons within the cochlear nucleus which spike strongly at stimulus onset[4], [13], [14]. Sound onsets may be important for sound source location[1], sound identification[15], [16], and are thought to play a role in the segregation of auditory streams[5], [6], [17]. Michael J. Newton and Leslie S. Smith are with the Institute of Com- puting Science and Mathematics, University of Stirling, UK (email: {lss, mjn}@cs.stir.ac.uk). This work was supported by EPSRC (UK) Grant EP/G062609/1. From an ecological perspective the sound onset is poten- tially useful because its location at the start of a sound may aid in priming a response. The initial onset also tends to be relatively untainted by reverberation, as it usually arrives at the listener via a direct path from the source. For most tasks later reflections are ignored in favour of the initial onset[18]. Every sound begins with an onset. However, the precise definition of what constitutes the ‘sound onset’ is less clear[19]. It is possible to analyse a sound onset based on the physics of the sound production mechanism. In the case of a trumpet blowing a pitched note, for example, there is a short period of time at the beginning of the note when the vibrating lips of the player are not influenced by the acoustics of the instrument. At some later time a coupled interaction begins, which leads to the steady-state pitched note. It may be argued that the onset portion of the note occurs before full coupling between instrument and player, and the steady-state portion follows coupling. However, such a physical process is not necessarily perceived in the same clear order by the auditory system. A number of further factors, such as reverberant reflections, may contribute to the final waveform which reaches the ear. The precise meaning of ‘onset’ in the context of perception can thus only be properly explored by studying the response of the auditory system to real sounds. What is clear is that the temporal fine-structure and frequency evolution of sound onsets varies widely, both in terms of perception[13] and from a generative standpoint. A drum hit, for example, clearly involves a different kind of physical onset than a slowly bowed violin string, and would be expected to produce a different sensation of ‘onset’ in a listener. We henceforth refer to the perceptual onset simply as the onset, and, in seeking to explore it with an auditory model, define it as a sudden and rapid rise in signal energy as seen by the sound receptor (in this case the cochlea). This may be a rise from a zero-level, or a pronounced increase from one level to a higher level. In this work the perceptual onset is simulated using a spik- ing time-domain auditory model, based on the gammatone filterbank[20]. Section II provides an overview of the model and the coding scheme. In section III a method is outlined which uses the simulated spiking onset response as a descrip- tor for a musical sound classification task. Musical samples are sourced from the McGill dataset[21]. The classification is performed using a time-domain reservoir neural network known as the echo state network[22], which is outlined in section IV. Section V provides an overview of some initial classification results.