International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1888 A Review of Video Classification Techniques Mittal C. Darji 1 , Dipti Mathpal 2 Assistant Professor, Information Technology Department, G.H. Patel College of Engineering & Technology, Gujarat, India Trainee Assistant Professor, Information Technology Department, G.H. Patel College of Engineering & Technology, Gujarat, India ---------------------------------------------------------------------------***--------------------------------------------------------------------------- Abstract - Video classification literature has been reviewed and techniques for the same are provided here in this paper. Classification process in general requires features based on which one can distinguish among the categories. These features are mainly taken from text, audio or visual content of the video. Based on that mainly three classification techniques are there as discussed here. Based on the application user has to select the method and features. Pros and cons of each method are mentioned in this paper with suitable applications. Keyword- Video classification, Text based classification, Audio based classification, Video based classification, features 1. INTRODUCTION The amount of video achieves that we have are increasing tremendously day by day. Use of internet and latest technologies are making it easy to share videos. This is leading to lots of duplication too. Finding out the type of videos you want to see is a very difficult task. Such a time consuming and tedious job must be made automatic. This automation task is called as video classification by researchers. Video classification has been used to classify videos into categories like sports, comedy, news, dance, horror etc. Some researchers have also classified a single video into parts of different categories. All these classifications require the characteristics which differ for each category. These characteristics are called features. Features can be extracted from any of the three components: Text, Audio and Video [1]. Researchers have used all the three in various ways for fulfilling their purposes of classification. This paper has summarized the methods and features used over the time. Rest of the paper is organized as follows: In section II we will describe the text based method. In section III we will see how audio based approach is used. Section IV contains the video based methods. Comparison of all these methods is described in section V. We will conclude in last section number VI. 2. TEXT BASED CLASSIFICATION In this method, we produce text from video and analyze it for classification. Text can be: 1) visible text on screen 2) text extracted from the speech [2]. In first category, the text visible on screen is extracted. For example, the score board of game, number on jersey of player, captions written on the screen etc. Such text can be extracted using Optical Character Recognition (OCR). In second category, the text is extracted from speech using speech recognition. This method is mainly used in providing subtitles or closed captions. Closed captions are mostly used to provide other types of sound such as a sound of animal or music. Subtitles are placed on screen to provide understanding in a familiar language. This text based research can also be used in document text classification and areas like handwritten text to digital document conversion, signature verification, handwriting matching etc. However, the problem is that such text is in a large amount and hence is difficult to deal with. Also, OCR is having a higher rate of errors. Text extracted from OCR will mostly contain a higher amount of spelling mistakes and omissions. A commonly used method while working with text is to represent the text using feature vector in bag-of- words model. This model uses the number of occurrence of any word. But this model does not contain the information about the order of these words in document. 3. AUDIO BASED CLASSIFICATION This approach is more used than text based in research and it is because audio processing requires lesser computational recourses and time. Storage of audio and its features requires lesser space than the video and text. To process audio, signal is sampled on a particular rate and from each sample certain features are extracted for review. These sampling windows can be overlapped in some cases. Suitable features from sampled signal are extracted based on the application requirement. Features of audio can be broadly classified in either physical features or perceptual features [3]. 3.1 Physical Features These are also called as tie domain features as they are directly measured from frequency values of the signal [6]. These are also called as low level features of signal.