International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 1059
ISSN 2229-5518
IJSER © 2014
http://www.ijser.org
Audio Classification and Retrieval by Using
Vector Quantization
Shruti Vaidya, Dr. Kamal Shah
Abstract—In today’s world, we can say that information and its processing has become the critical aspect for functioning of everything. In
the early days, information was generally obtained and processed in the form of text. Today information is available in all forms namely,
text, music, graphics, etc. which are a easily understandable and accurately represent information. Information is first captured then the
captured information is retrieved and analyzed for further requirements. In this paper, the information that we take into consideration is in
audio form. We have studied the feature vector extraction methods, similarity measurement techniques, and have also measured the
performance parameters. It has been observed that the use of multiple feature vectors provides better and more accurate classification and
retrieval of audios from large database.
Index Terms— Audio, Audio Retrieval, Audio Vector Quantization, Data Compression, k-Nearest Neighbor, Precision Recall, Vector
Quantization
—————————— ——————————
1 INTRODUCTION
ector Quantization (VQ) is an efficient and simple ap-
proach for data compression. Since it is simple and easy
to implement, it is widely used in different applications,
such as pattern recognition, face detection, image segmenta-
tion, speech data compression, Content Based Image Retrieval,
tumor detection etc. Vector quantization is a lossy compres-
sion technique. There are three major procedures in vector
quantization, namely codebook generation, encoding proce-
dure and decoding procedure. In the codebook generation
process, audio is divided into several k-dimensional training
vectors. The representative codebook is generated from these
training vectors by the clustering techniques. In the encoding
procedure, the original audio is divided into numerous k-
dimensional vectors and the encoding of each vector is done
by indexing of codeword by a look up table methodology. The
encoded results are called an index table.
In the decoding procedure, the same codebook is used by the
receiver to translate the index back again into its appropriate
codeword for rebuilding of the audio. One of the key points
of Vector Quantization is to generate a good codebook such
that the distortion between the original and the reconstructed
audio should be minimum. In order to find the best-matched
codeword in the encoder, various codebook full search algo-
rithm can be used [1].
2 OVERVIEW OF SYSTEM
Research till today in audio classification tends to focus on
matching test sounds into a limited number of predefined cat-
egories such as music, applause, speech etc., but this approach
would describe each sound on the feature vectors.
Furthermore, the proposed system allows intelligent interpre-
tation of unseen examples, e.g. describe a door closing based
on the similarity to previously seen events. The new signal can
be easily classified and other related sounds can also be re-
trieved in relation to the other sounds as shown in the Fig.1.
For instance, consider a system where given an input sound of
a door closing, would return the label “background sound”,
and will retrieve from a database samples most similar to it.
Fig.1: System Overview
V
————————————————
• Shruti Vaidya is currently pursuing masters of engineering degree pro-
gram in information technology, TCET, Mumbai University,India,
E-mail: shrutiv01@gmail.com
• Dr. Kamal Shah is currently a professor in masters of engineering infor-
mation technology department,TCET, Mumbai University,India,
E-mail: kamal.shah@thakureducation.org
IJSER