International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 8 (2020), pp. 1874-1879 © International Research Publication House. https://dx.doi.org/10.37624/IJERT/13.8.2020.1874-1879 1874 Real Time Action Recognition in Surveillance Video Using Machine Learning Abdulrahman S. Alturki 1* and Anwar H. Ibrahim 2 1, 2 Department of Electrical Engineering, College of Engineering, Qassim University, Qassim, Saudi Arabia. *Corresponding Author Abstract Human gesture identification plays a crucial and key role in the surveillance and security domains. This technique is most wanted in today's world to identify the culprits or specific people over the surveillance cameras. In this proposed method the action identification is aided through the machine learning technology. Initially the frames of the subject under focus are segmented and modeled by the Gaussian modeling. The entire process of feature extraction and quantizing are done considering the characteristics of the area of interest. The proposed method has two phases testing and training phase using three datasets for validation and KNN classifier for classification. The output of the proposed algorithm is analyzed and is said to be have the correct gesture identification with accuracy rate of 95.568%. Keywords: KNN Classifier, datasets, Gesture identification, Prewitt filter, GMM modeling, machine learning, feature extraction, multiple view points I. INTRODUCTION For the past two decades the domains such as Machine learning and computer vision has set an challenging goal to recognize the actions of human in an autonomous mode. Now the approach has its extensive applications in many emerging fields like deep learning, medical industry, Security systems, intelligence surveillance, etc. Thus it is in the limelight and draws more attention among the researchers at present[1]-[3]. The identification of human action through computer technology has its strong implementation in the real world applications like Storing Big files[4], Action identification[5], Indexing[6] and securing the videos[7], etc. The critical and important part of this technology is the interaction between the machine and the human. Visual signal plays an crucial rule in the recognition of the human actions and the communication between the computer and human. The recent developed techniques involves manual sampling of the signal before digitizing in the computer[9]. There are some practical impossibilities and difficulties in setting the starting and end portion of the sequence samples. Hence an algorithm was developed to automate the action division in an image sequence. The actions of the subjects in an image sequence can vary based on the pattern, pose, mobility, etc. These parameters are yet an challenging issues which affects the image properties like luminance, chrominance, Background sequence, etc. The most important key point is the view point dissimilarity as the HAR approach are based on the visual signals which is recognized from the single view of capturing. For the entire process starting from training and ending with the testing same camera is used to capture the views. But the real time applications cannot have same setup in their process. This causes vigorous fall in the accuracy of the different views. The failure of the single view techniques occurs due to the hiding of a portion due to some inevitable obstructions. To overcome these limitations and to obtain the absolute image multiple cameras are used to capture the image. This led to the discovery of the term Multi-view action identification. In this multi view approach the images are captured either in the form of 2D or 3D. In three dimensional capturing the object under focus is segmented into multiple views[9] and the mobility depiction is framed for identification of the movements. The model construction in this approach involves the use of geometrical shapes as patterns. The 3D approach is generally used in the real time applications such as Histogram shaping, optical patterns in 3D, Storage of action history, Skeleton representation in 3D[10], action patterns in spatial domain, etc. The 3D representation has the advantage of improved accuracy rate over the two dimensional capturing but with the drawback of cost expensive. Hence it is not much preferred in the real time applications. But yet the three dimensional capturing proved have fair quality of construction as the feature extraction of the approach depends upon the multi view capturing[11]. The errors in this approach arise due to the lack of proper details which are lost during segmentation of the 3D modeling. In general the best three dimensional model is constructed with the overlapping of the views[12][13]. Thus for the good quality 3D representation ample amount of views are required. With the advancement in the image capturing techniques three dimensional cameras are readily available to capture 3D view of the object in focus. Out of many devices 3D Microsoft