Information Rate for Fast Time-Domain Instrument Classification Jordan Ubbens and David Gerhard University of Regina, Department of Computer Science Abstract. In this paper, we propose a novel feature set for instrument classification which is based on the information rate of the signal in the time domain. The feature is extracted by calculating the Shannon en- tropy over a sliding short-time energy frame and binning statistical fea- tures into a unique feature vector. Experimental results are presented, including a comparison to frequency-domain feature sets. The proposed entropy features are shown to be faster than popular frequency-domain methods while maintaining comparable accuracy in an instrument clas- sification task. Keywords: Audio Classification, Audio Features, Audio Signal Processing, Time- Domain Methods 1 Introduction Audio classification is a wide and diverse field which incorporates many subdo- mains including speech recognition and speaker verification, musical signal and musical instrument classification, digital foley, and others. Most audio classifi- cation systems require a number of preliminary steps before classification can take place, among them being auditory stream segregation and domain selection. Stream segregation is the process of separating a single auditory event from a collection or cluster of events that happen at the same time. This is a complicated and demanding area of research that will be beyond the scope of this paper. For the purposes of this paper, we are assuming that all auditory events happen in isolation and without noise. A second requirement of most audio classification systems is some type of restriction of the relevant subdomain. In an ideal world, a single classification system would handle all forms of sound but in practice, systems designed for classification of a limited domain of sounds perform better than generalized systems. For the purposes of this paper, we will be restricting our focus to the classification of sounds generated by musical instruments. Clas- sification systems tend to proceed by first preprocessing the audio signal; then extracting features from the audio signal; training a classification model using this set of training instances; and verifying this trained system against previ- ously unseen signals. This paper will consider the feature extraction phase, and make the assumption that the features presented herein can be applied to any standard classification process pipeline.