Using Discrete Cosine Transform Based Features for Human Action Recognition Tasweer Ahmad and Junaid Rafique Electrical Engineering Department, Government College University, Lahore, Pakistan Email: tasveer.ahmad@gcu.edu.pk, junaidumtee70@gmail.com Hassam Muazzam Electrical Engineering Department, University of Punjab, Lahore, Pakistan Email: hassammuazzam@hotmail.com Tahir Rizvi Dipartimento di Automatica e Informatica, Politecnico di Torino, Turin, Italy Email: Syed.rizvi@polito.it Abstract—Recognizing human action in complex video sequences has always been challenging for researchers due to articulated movements, occlusion, background clutter, and illumination variation. Human action recognition has wide range of applications in surveillance, human computer interaction, video indexing and video annotation. In this paper, a discrete cosine transform based features have been exploited for action recognition. First, motion history image is computed for a sequence of images and then blocked- based truncated discrete cosine transform is computed for motion history image. Finally, K-Nearest Neighbor (K-NN) classifier is used for classification. This technique exhibits promising results for KTH and Weizmann dataset. Moreover, the proposed model appears to be computationally efficient and immune to illumination variations; however, this model is prone to viewpoint variations. Index Terms—motion history image, discrete cosine interaction, video indexing, video annotation I. INTRODUCTION The task of Human Action recognition has always been challenging and fascinating for computer vision scientists and researchers within last two decades years. Human Action recognition has found numerous applications in video surveillance, motion tracking, scene modelling and behavior understanding [1]. Intelligent and effective Human Action recognition has received a lot of attention and funding due to rapidly increasing security concerns and effective surveillance of public places such as airports, bus stations, railway stations, shopping malls etc. [1]. Human Action recognition systems can also be deployed at health-care centers, day-care centers, and old homes for monitoring and for fall detection. Human Computer Interaction (HCI), using action recognition, finds ample of applications in interactive and gaming Manuscript received February 3, 2015; revised August 25, 2015. environment [2]. R. T. Collins et al. in 2000 [3] suggested that video surveillance can be widely categorized as human detection and tracking, human motion analysis and activity recognition. At that time, they further suggested that “...activity analysis will be the most important area of future research in video surveillance.” Now, this projection seems true as a large number of research articles have been published in this domain over the last decade. Although surveillance cameras and monitoring systems are quite prevalent and affordable, but still it is very challenging to devise a robust surveillance systems due to human factors like fatigue and boredom. It is highly desirable to devise such an intelligent system that can recognize common human actions with remarkable accuracy, multi-scale resolution and minimal computational complexity. A lot of efforts have been made by computer vision researchers to overcome these challenges. A survey by [4] highlights the importance and applications of Intelligent Video Systems and Analytics (IVA). In this survey, both system analytics and theoretical analytics have been targeted. Video system hardware is being developed at faster rate due to digital signal processors and VLSI Design, but still hardware- oriented issues are unresolved due to system scalability, compatibility and real-time performance [5]. Theoretical Analytics deal with more robust and computationally efficient algorithms. Another breakthrough came in human action recognition by the introduction of multiple cameras for rendering Multi-View Videos for pose estimation and activity recognition. The performance of such systems drastically ameliorated when videos were accessed from multiple cameras [6]. The price paid for multi-channel video was computational complexity; certainly there must be compromise between performance and complexity of the system. Now-a-days, Infra-Red (IR) Sensor based monocular cameras are widely spread for video gaming and human Journal of Image and Graphics, Vol. 3, No. 2, December 2015 ©2015 Journal of Image and Graphics 96 doi: 10.18178/joig.3.2.96-101 transform, K-nearest neighbor, human computer