International Journal of Engineering Research and Technology. ISSN 0974-3154, Volume 13, Number 8 (2020), pp. 1874-1879
© International Research Publication House. https://dx.doi.org/10.37624/IJERT/13.8.2020.1874-1879
1874
Real Time Action Recognition in Surveillance Video Using Machine
Learning
Abdulrahman S. Alturki
1*
and Anwar H. Ibrahim
2
1, 2
Department of Electrical Engineering, College of Engineering, Qassim University, Qassim, Saudi Arabia.
*Corresponding Author
Abstract
Human gesture identification plays a crucial and key role in
the surveillance and security domains. This technique is most
wanted in today's world to identify the culprits or specific
people over the surveillance cameras. In this proposed method
the action identification is aided through the machine learning
technology. Initially the frames of the subject under focus are
segmented and modeled by the Gaussian modeling. The entire
process of feature extraction and quantizing are done
considering the characteristics of the area of interest. The
proposed method has two phases testing and training phase
using three datasets for validation and KNN classifier for
classification. The output of the proposed algorithm is
analyzed and is said to be have the correct gesture
identification with accuracy rate of 95.568%.
Keywords: KNN Classifier, datasets, Gesture identification,
Prewitt filter, GMM modeling, machine learning, feature
extraction, multiple view points
I. INTRODUCTION
For the past two decades the domains such as Machine learning
and computer vision has set an challenging goal to recognize
the actions of human in an autonomous mode. Now the
approach has its extensive applications in many emerging fields
like deep learning, medical industry, Security systems,
intelligence surveillance, etc. Thus it is in the limelight and
draws more attention among the researchers at present[1]-[3].
The identification of human action through computer
technology has its strong implementation in the real world
applications like Storing Big files[4], Action identification[5],
Indexing[6] and securing the videos[7], etc. The critical and
important part of this technology is the interaction between the
machine and the human. Visual signal plays an crucial rule in
the recognition of the human actions and the communication
between the computer and human. The recent developed
techniques involves manual sampling of the signal before
digitizing in the computer[9]. There are some practical
impossibilities and difficulties in setting the starting and end
portion of the sequence samples. Hence an algorithm was
developed to automate the action division in an image
sequence. The actions of the subjects in an image sequence can
vary based on the pattern, pose, mobility, etc. These parameters
are yet an challenging issues which affects the image properties
like luminance, chrominance, Background sequence, etc. The
most important key point is the view point dissimilarity as the
HAR approach are based on the visual signals which is
recognized from the single view of capturing. For the entire
process starting from training and ending with the testing same
camera is used to capture the views. But the real time
applications cannot have same setup in their process. This
causes vigorous fall in the accuracy of the different views. The
failure of the single view techniques occurs due to the hiding of
a portion due to some inevitable obstructions. To overcome
these limitations and to obtain the absolute image multiple
cameras are used to capture the image. This led to the
discovery of the term Multi-view action identification.
In this multi view approach the images are captured either in
the form of 2D or 3D. In three dimensional capturing the object
under focus is segmented into multiple views[9] and the
mobility depiction is framed for identification of the
movements. The model construction in this approach involves
the use of geometrical shapes as patterns. The 3D approach is
generally used in the real time applications such as Histogram
shaping, optical patterns in 3D, Storage of action history,
Skeleton representation in 3D[10], action patterns in spatial
domain, etc.
The 3D representation has the advantage of improved accuracy
rate over the two dimensional capturing but with the drawback
of cost expensive. Hence it is not much preferred in the real
time applications. But yet the three dimensional capturing
proved have fair quality of construction as the feature
extraction of the approach depends upon the multi view
capturing[11]. The errors in this approach arise due to the lack
of proper details which are lost during segmentation of the 3D
modeling. In general the best three dimensional model is
constructed with the overlapping of the views[12][13]. Thus for
the good quality 3D representation ample amount of views are
required.
With the advancement in the image capturing techniques three
dimensional cameras are readily available to capture 3D view
of the object in focus. Out of many devices 3D Microsoft