International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 3 Issue: 4 1831 - 1835 _____________________________________________________________________________________________ 1831 IJRITCC |April 2015, Available @ http://www.ijritcc.org _____________________________________________________________________________________ Action Recognition using High-Level Action Units Paul. T. Jaba M.E, Department of Computer Science St. Joseph College of Engineering, Sriperumbudur, India write2jaba@gmail.com Ms. Jackulin Asha G. S. Department of CSE, St. Joseph College of Engineering, Sriperumbudur, India Abstract— Vision-based human recognition is the process of naming image sequences with action labels. In this project, a model is developed for human activity detection using high-level action units to represent human activity. Training phase learns the model for action units and action classifiers. Testing phase uses the learned model for action prediction.Three components are used to classify activities such as New spatial- temporal descriptor, Statistics of the context-aware descriptors, Suppress noise in the action units. Representing human activities by a set of intermediary concepts called action units which are automatically learned from the training data. At low-level, we have existing a locally weighted word context descriptor to progress the traditional interest-point-based representation. The proposed descriptor incorporates the neighborhood details effectively. At high-level, we have introduced the GNMF-based action units to bridge the semantic gap in activity representation. Moreover, we have proposed a new joint l2,1-norm based sparse model for action unit selection in a discriminative manner. Broad experiments have been passed out to authorize our claims and have confirmed our intuition that the action unit based representation is dangerous for modeling difficult activities from videos. Keywords- Action unit, sparse representation, nonnegative matrix factorization, action recognition __________________________________________________*****_________________________________________________ I. INTRODUCTION HUMAN activity detection has a wide range of applications such as video content analysis, activity surveillance,and uman- computer interaction [2]. As one of the most active topics in computer vision, much work on human activity detection has been reported. In most of the traditional approaches for human activity detection, activity models are normally constructed from patterns of low-level features such as appearance patterns [4], optical flow, space-time templates, 2D shape matching, trajectory-based representation and bag-of-visual-words (BoVW). However, these features can hardly distinguish rich semantic arrangement in activity. Inspired by latest growth in object categorization, introducing a high-level concept named “action unit” to describe human actions. For sample, the “golf-swinging” activity contains some representative motions, such as “arm swing” and “torso twist”. They are hardly described using the low-level features mentioned above. Otherwise, some connected space-time interest points, when joint together, we can characterize a representative action. In addition, the key frame is essential to describe an activity; and a key frame may be characterize by the co-occurrence of space-time interest points extracted from the frame. The representative actions and key frames both mirror some action units, which can then be used to represent action classes. With the above examination, propose using high-level action units for human activity representation. Usually, from an input human activity video, hundreds of interest points are first extracted and then agglomerated into tens of action units, which then compactly represent the video. Such a representation is more discriminative than traditional BoVW model. To utilize it for activity detection, we address the following three major issues. Fig 1:System Architecture 1. Selecting low-level features for generate the action unit. Some of the aforementioned features needs reliable tracking or body pose evaluation, which is hard to attain in practice. The interest-point-based representation avoid such requirement while being robust to noise, occlusion and geometric deviation. But conventional bag-of-visual-words models (BoVW) use only features from single interest points and avoid spatial-temporal context details. To address this issue, we propose a latest context-aware descriptor that incorporates context details from adjacent interest points. This way, the new descriptor is more discriminative and robust than the traditional BoVW.