Acoustic Event Classification Using Ensemble of
One-Class Classifiers for Monitoring Application
Achyut Mani Tripathi
Department of Computer Science
& Engineering
Indian Institute of Technology Guwahati
Guwahati-781039, Assam, India
Email: t.achyut@iitg.ernet.in
Diganta Baruah
Department of Information Technology
Sikkim Manipal Institute of Technology
Majitar, -737136, Sikkim, India
Email: diganta.b@smit.smu.edu.in
Rashmi Dutta Baruah
Department of Computer Science
& Engineering
Indian Institute of Technology Guwahati
Guwahati-781039, Assam, India
Email: r.duttabaruah@iitg.ernet.in
Abstract—In this paper we investigate the application of
ensemble of one-class classifiers to the problem of acoustic event
classification. We present some initial results that are based on
acoustic signal emitted by different litter causing material when
contacted by human. When a person interacts with an object
made with a specific material, a characteristic sound is produced
as a result of the interactions. We consider such interactions
or activities as atomic events. We propose the application of
ensemble of one-class fuzzy rule-based classifier to the problem
of identification of activities that can cause possible litter in the
public places. The experimental results show that the classifier
gives satisfactory results and at the same time has low false alarm
rate. The results are comparable to widely used one-class SVM.
Moreover, the method is adaptive and suitable for incremental
learning.
I. I NTRODUCTION
Litter is a growing threat globally, its control, prevention,
and monitoring are major challenges faced by most of the
countries worldwide. In an attempt to address these issues,
we are working on an automated system that would enable
us to detect the activities that can possibly cause litter in
public places, such as bus stops and park. Upon detection
of such activities the system would generate a voice message
(through a speaker) so that people who perform littering activ-
ity (habitually, deliberately, or accidentally) can be reminded
to appropriately bin their trash. The system would also help
authorities to take preventive measures and strategic decisions,
for example, place sufficient litter receptacles in locations
where frequent littering activities are detected or deploy more
man-power to keep the area clean.
In this paper we are presenting some initial results that are
based on acoustic signals emitted by different litter causing
material when contacted by human. Here we are considering
two common sources of litter, polymer packets mostly used for
packaging of snacks like potato chips, and paper cups. When a
person interacts with such objects made with specific material
a characteristic sound is produced as a result of interaction.
For example, when someone opens a packet of chips it will
produce a specific sound. The aim is to recognize such sound
producing events or acoustic events from a continuous audio
stream. We considered acoustic sensing over image or video
sensing for monitoring task, as it has certain distinctive char-
acteristics. First, acoustic sensing in omnidirectional as it can
capture information from all directions, and is relatively less
sensitive to the position and orientation. Second, it allows for
non-intrusive sensing without invading the subject’s privacy.
Third, processing of acoustic data is relatively faster than
image or video. Finally, the cost of the system based on
acoustic sensors will be less.
The area of acoustic event detection and classification has
recently gained attention due to its relevance to many real-
world applications such as surveillance and monitoring [1],
ambient assisted living [2], [3], [4], [5], [6], audio indexing and
retrieval [7], [8], [9], human robot interaction [10]. While the
task of acoustic event classification (AEC) involves determin-
ing the type of events that have already been extracted from an
audio stream, acoustic event detection (AED) deals with both
identifying the type of events and position of those events
in time. One of the vital steps in acoustic event classification
and detection is audio signal feature extraction .The problem of
feature extraction has been addressed by many existing works.
Some of the features that have been successfully applied to
AEC task are: perceptual features (short-time energy, zero-
crossing rate, sub-band energy, spectral-centroid, spectral roll-
off, pitch) [11], and conventional automatic speech recognition
features (Mel-frequency Ceepstral Coefficients-MFCC [12],
Linear Predictive Cepstral Coefficients-LPCC) [12]. The most
commonly used approaches for classification are: Bayesian
Classifier [13], Gaussian Mixture Model (GMM)[14], [15],
Hidden Markov Models (HMM) [16], Support Vector Ma-
chines (SVM) [17], Artificial Neural Networks, Decision trees,
Random forests, Fuzzy rule-based classifiers [4].
Even though designed for the task of automatic speech
recognition, MFCC features have been shown to work for non-
speech environment sound recognition [2]. This motivated us
to prefer MFCC for feature extraction. For the classification
task, we considered one-class classifiers which are suitable
when all data belong to one class(often referred to as target
class). In our problem it is not possible to collect and label
data for all the human activities apart from activities that
may cause litter i.e the available data has only one class
that represent activities that can cause litter. We consider four
2015 IEEE Symposium Series on Computational Intelligence
978-1-4799-7560-0/15 $31.00 © 2015 IEEE
DOI 10.1109/SSCI.2015.236
1681