Acoustic Event Classiﬁcation Using Ensemble of One-Class Classiﬁers for Monitoring Application Achyut Mani Tripathi Department of Computer Science & Engineering Indian Institute of Technology Guwahati Guwahati-781039, Assam, India Email: t.achyut@iitg.ernet.in Diganta Baruah Department of Information Technology Sikkim Manipal Institute of Technology Majitar, -737136, Sikkim, India Email: diganta.b@smit.smu.edu.in Rashmi Dutta Baruah Department of Computer Science & Engineering Indian Institute of Technology Guwahati Guwahati-781039, Assam, India Email: r.duttabaruah@iitg.ernet.in Abstract—In this paper we investigate the application of ensemble of one-class classiﬁers to the problem of acoustic event classiﬁcation. We present some initial results that are based on acoustic signal emitted by different litter causing material when contacted by human. When a person interacts with an object made with a speciﬁc material, a characteristic sound is produced as a result of the interactions. We consider such interactions or activities as atomic events. We propose the application of ensemble of one-class fuzzy rule-based classiﬁer to the problem of identiﬁcation of activities that can cause possible litter in the public places. The experimental results show that the classiﬁer gives satisfactory results and at the same time has low false alarm rate. The results are comparable to widely used one-class SVM. Moreover, the method is adaptive and suitable for incremental learning. I. I NTRODUCTION Litter is a growing threat globally, its control, prevention, and monitoring are major challenges faced by most of the countries worldwide. In an attempt to address these issues, we are working on an automated system that would enable us to detect the activities that can possibly cause litter in public places, such as bus stops and park. Upon detection of such activities the system would generate a voice message (through a speaker) so that people who perform littering activ- ity (habitually, deliberately, or accidentally) can be reminded to appropriately bin their trash. The system would also help authorities to take preventive measures and strategic decisions, for example, place sufﬁcient litter receptacles in locations where frequent littering activities are detected or deploy more man-power to keep the area clean. In this paper we are presenting some initial results that are based on acoustic signals emitted by different litter causing material when contacted by human. Here we are considering two common sources of litter, polymer packets mostly used for packaging of snacks like potato chips, and paper cups. When a person interacts with such objects made with speciﬁc material a characteristic sound is produced as a result of interaction. For example, when someone opens a packet of chips it will produce a speciﬁc sound. The aim is to recognize such sound producing events or acoustic events from a continuous audio stream. We considered acoustic sensing over image or video sensing for monitoring task, as it has certain distinctive char- acteristics. First, acoustic sensing in omnidirectional as it can capture information from all directions, and is relatively less sensitive to the position and orientation. Second, it allows for non-intrusive sensing without invading the subject’s privacy. Third, processing of acoustic data is relatively faster than image or video. Finally, the cost of the system based on acoustic sensors will be less. The area of acoustic event detection and classiﬁcation has recently gained attention due to its relevance to many real- world applications such as surveillance and monitoring [1], ambient assisted living [2], [3], [4], [5], [6], audio indexing and retrieval [7], [8], [9], human robot interaction [10]. While the task of acoustic event classiﬁcation (AEC) involves determin- ing the type of events that have already been extracted from an audio stream, acoustic event detection (AED) deals with both identifying the type of events and position of those events in time. One of the vital steps in acoustic event classiﬁcation and detection is audio signal feature extraction .The problem of feature extraction has been addressed by many existing works. Some of the features that have been successfully applied to AEC task are: perceptual features (short-time energy, zero- crossing rate, sub-band energy, spectral-centroid, spectral roll- off, pitch) [11], and conventional automatic speech recognition features (Mel-frequency Ceepstral Coefﬁcients-MFCC [12], Linear Predictive Cepstral Coefﬁcients-LPCC) [12]. The most commonly used approaches for classiﬁcation are: Bayesian Classiﬁer [13], Gaussian Mixture Model (GMM)[14], [15], Hidden Markov Models (HMM) [16], Support Vector Ma- chines (SVM) [17], Artiﬁcial Neural Networks, Decision trees, Random forests, Fuzzy rule-based classiﬁers [4]. Even though designed for the task of automatic speech recognition, MFCC features have been shown to work for non- speech environment sound recognition [2]. This motivated us to prefer MFCC for feature extraction. For the classiﬁcation task, we considered one-class classiﬁers which are suitable when all data belong to one class(often referred to as target class). In our problem it is not possible to collect and label data for all the human activities apart from activities that may cause litter i.e the available data has only one class that represent activities that can cause litter. We consider four 2015 IEEE Symposium Series on Computational Intelligence 978-1-4799-7560-0/15 $31.00 © 2015 IEEE DOI 10.1109/SSCI.2015.236 1681