International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1888
A Review of Video Classification Techniques
Mittal C. Darji
1
, Dipti Mathpal
2
Assistant Professor, Information Technology Department, G.H. Patel College of Engineering & Technology, Gujarat,
India
Trainee Assistant Professor, Information Technology Department, G.H. Patel College of Engineering & Technology,
Gujarat, India
---------------------------------------------------------------------------***---------------------------------------------------------------------------
Abstract - Video classification literature has been
reviewed and techniques for the same are provided here in
this paper. Classification process in general requires features
based on which one can distinguish among the categories.
These features are mainly taken from text, audio or visual
content of the video. Based on that mainly three
classification techniques are there as discussed here. Based
on the application user has to select the method and
features. Pros and cons of each method are mentioned in this
paper with suitable applications.
Keyword- Video classification, Text based classification,
Audio based classification, Video based classification,
features
1. INTRODUCTION
The amount of video achieves that we have are
increasing tremendously day by day. Use of internet and
latest technologies are making it easy to share videos. This
is leading to lots of duplication too. Finding out the type of
videos you want to see is a very difficult task. Such a time
consuming and tedious job must be made automatic. This
automation task is called as video classification by
researchers.
Video classification has been used to classify videos into
categories like sports, comedy, news, dance, horror etc.
Some researchers have also classified a single video into
parts of different categories. All these classifications require
the characteristics which differ for each category. These
characteristics are called features.
Features can be extracted from any of the three
components: Text, Audio and Video [1]. Researchers have
used all the three in various ways for fulfilling their
purposes of classification. This paper has summarized the
methods and features used over the time.
Rest of the paper is organized as follows: In section II
we will describe the text based method. In section III we
will see how audio based approach is used. Section IV
contains the video based methods. Comparison of all these
methods is described in section V. We will conclude in last
section number VI.
2. TEXT BASED CLASSIFICATION
In this method, we produce text from video and analyze
it for classification. Text can be: 1) visible text on screen 2)
text extracted from the speech [2]. In first category, the text
visible on screen is extracted. For example, the score board
of game, number on jersey of player, captions written on
the screen etc. Such text can be extracted using Optical
Character Recognition (OCR). In second category, the text is
extracted from speech using speech recognition. This
method is mainly used in providing subtitles or closed
captions. Closed captions are mostly used to provide other
types of sound such as a sound of animal or music. Subtitles
are placed on screen to provide understanding in a familiar
language.
This text based research can also be used in document
text classification and areas like handwritten text to digital
document conversion, signature verification, handwriting
matching etc. However, the problem is that such text is in a
large amount and hence is difficult to deal with. Also, OCR is
having a higher rate of errors. Text extracted from OCR will
mostly contain a higher amount of spelling mistakes and
omissions. A commonly used method while working with
text is to represent the text using feature vector in bag-of-
words model. This model uses the number of occurrence of
any word. But this model does not contain the information
about the order of these words in document.
3. AUDIO BASED CLASSIFICATION
This approach is more used than text based in research
and it is because audio processing requires lesser
computational recourses and time. Storage of audio and its
features requires lesser space than the video and text. To
process audio, signal is sampled on a particular rate and
from each sample certain features are extracted for review.
These sampling windows can be overlapped in some cases.
Suitable features from sampled signal are extracted based
on the application requirement. Features of audio can be
broadly classified in either physical features or perceptual
features [3].
3.1 Physical Features
These are also called as tie domain features as they are
directly measured from frequency values of the signal [6].
These are also called as low level features of signal.