Information Rate for Fast Time-Domain Instrument Classiﬁcation Jordan Ubbens and David Gerhard University of Regina, Department of Computer Science Abstract. In this paper, we propose a novel feature set for instrument classiﬁcation which is based on the information rate of the signal in the time domain. The feature is extracted by calculating the Shannon en- tropy over a sliding short-time energy frame and binning statistical fea- tures into a unique feature vector. Experimental results are presented, including a comparison to frequency-domain feature sets. The proposed entropy features are shown to be faster than popular frequency-domain methods while maintaining comparable accuracy in an instrument clas- siﬁcation task. Keywords: Audio Classiﬁcation, Audio Features, Audio Signal Processing, Time- Domain Methods 1 Introduction Audio classiﬁcation is a wide and diverse ﬁeld which incorporates many subdo- mains including speech recognition and speaker veriﬁcation, musical signal and musical instrument classiﬁcation, digital foley, and others. Most audio classiﬁ- cation systems require a number of preliminary steps before classiﬁcation can take place, among them being auditory stream segregation and domain selection. Stream segregation is the process of separating a single auditory event from a collection or cluster of events that happen at the same time. This is a complicated and demanding area of research that will be beyond the scope of this paper. For the purposes of this paper, we are assuming that all auditory events happen in isolation and without noise. A second requirement of most audio classiﬁcation systems is some type of restriction of the relevant subdomain. In an ideal world, a single classiﬁcation system would handle all forms of sound but in practice, systems designed for classiﬁcation of a limited domain of sounds perform better than generalized systems. For the purposes of this paper, we will be restricting our focus to the classiﬁcation of sounds generated by musical instruments. Clas- siﬁcation systems tend to proceed by ﬁrst preprocessing the audio signal; then extracting features from the audio signal; training a classiﬁcation model using this set of training instances; and verifying this trained system against previ- ously unseen signals. This paper will consider the feature extraction phase, and make the assumption that the features presented herein can be applied to any standard classiﬁcation process pipeline.