HMM Based Classiﬁcation of Sports Videos Using Color Feature Josh Hanna ∗ , Fatma Patlar † , Akhan Akbulut † , Engin Mendi ∗ and Coskun Bayrak ∗† ∗ Computer Science Department, University of Arkansas at Little Rock, AR, USA {jjhanna | esmendi | cxbayrak}@ualr.edu † Computer Engineering Department, Istanbul Kultur University, Istanbul, Turkey {f .patlar | a.akbulut | c.bayrak}@iku.edu.tr Abstract—Video content classiﬁcation is an important element for efﬁcient access and retrieval of video in any media content management system. Categorizing the video segments can help to provide convenience and ease in accessing the relevant video content without sequential scanning. In this paper, we present a Hidden Markov Model (HMM) based classiﬁcation technique for sports videos. Speed of color changes is computed for each video frame and used as observation sequences in HMM for classiﬁcation. Experiments using more than 1 hour of 18 training and 18 testing sports videos of 3 predeﬁned genres (golf, hockey and football) give very satisfactory classiﬁcation accuracy. I. I NTRODUCTION Multimedia content classiﬁcation refers to the computerized apprehension of the semantic meanings of a multimedia ﬁle or document. With the increase in digital video contents, efﬁcient techniques for classiﬁcation of videos according to their contents have become more important. Applications such as digital libraries, e-Learning, video-on-demand, digital video broadcast and interactive TV generate and use large collections of video data. For an effective use of these video data, all digital contents must be classiﬁed based on their categories. There has been a growing demand for content based automatic video classiﬁcation for the web multimedia administration. For this reason, numerous research is being done for such systems. Several content based classiﬁcation systems for organizing and managing video databases have been recently proposed. Classiﬁcation of the videos into predeﬁned genres is the most prefered. Basic working principle for this type of applications is classical pattern classiﬁcation algorithm [1]. First, features like color, sound or video text are extracted from the videos, then passed from a reduction process to be ready for the classiﬁcation. In [2], nearest neighbor clustering is used for video classi- ﬁcation. A more complex framework is represented as fully automatic and computationally efﬁcient framework for analy- sis and summarization of soccer videos using cinematic and object-based features. This model uses cinematic and object- based features for semantic analysis of sports videos [3]. Extracted features are commonly classiﬁed with HMM for segmenting video contents. Boreczky and Lynn [4] used three types of features for video segmentation; the standard histogram difference, an audio distance measure and an es- timate of object motion between two adjacent frames. Other implementations operated object color and texture features to generate highlights for soccer videos [5]. Zhu [6] classiﬁed news stories using features obtained from closed captions. This work is an example for video classiﬁcation using only text features. Liu [7] [8] [9] used audio features such as non-silence ratio, volume standard deviation, volume dynamic range, pitch standard deviation, voice/music ratio, noise/unvoice ratio, fre- quency centroid and frequency bandwidth. Those features are extracted from the segments of the sampled audio signals and used in one-class-one-network structure for classiﬁcation. In this paper, we present a video classiﬁcation approach based on HMM for video content classiﬁcation using color feature. Our aim is to categorize the input video from the predeﬁned groups: golf, hockey and football. The rest of this paper is organized as follows. Section 2 presents the concept of applying HMM for video classiﬁcation and our feature extraction details. Experimental results and conclusion are given in Section 3 and Section 4, respectively. II. HMM FOR VIDEO CLASSIFICATION HMM is a popular technique widely used in signal process- ing. HMMs are a formal foundation for making probabilistic models of linear sequence “labeling” problems [10] and they are especially known for their applications in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial dis- charges and bioinformatics. They are mostly used for classi- fying sequential data to capture the temporal relationships of the extracted features. In our research, we extended it to video analysis and classiﬁcation. A. Deﬁnition of HMM In an HMM, there are a ﬁnite number of states, each of which is associated with a transition probability to the others. Everytime, the HMM stays in one deﬁnite state. The states at time t is directly inﬂuenced by the state at time t - 1. After each translation from one state to another, an output observation is generated based on an observation probability distribution associated with the current states [10]. Formally, a HMM is deﬁned to be: HMM = {N,B, Π} where N is the set of states, B is the number of observation symbols and Π is set of state transition probabilities.