Improved Semantic-based Human Interaction Understanding Using Context-based knowledge Kamrad Khoshhal Roudposhti * , Jorge Dias *,** * Institute of Systems and Robotics, Department of Electrical and Computer Engineering, University of Coimbra, Portugal. ** Khalifa University, UAE {kamrad,jorge@isr.uc.pt} Abstract—This paper proposes a descriptive approach for context-based human activity analysis through an hierarchical framework in a scene understanding application. Each human movement with respect to himself, others and scene, can arise different layers of human activities analysis, which usually inves- tigated separately depend on the application. Human behaviour can not be analysed properly, since the all different layers of information were not considered. The effect of using the different layers of information to increase the accuracy of the analysis is presented in the study. The contributions are, using different information layers such as human body parts movement and human-object interaction, in 3D space, to improve human activity analysis, and proposing a probabilistic and descriptive model, based on a well-known human movement descriptor and Bayesian Network (BN) approach. Thus, based on the mentioned framework, the model is generalizable and flexible which are necessary for having such an applicable system. The capability of the proposed approach is presented in the experiment’s section. Index Terms—Scene understanding, hierarchical framework, human interaction analysis,Bayesian approach, human movement analysis, descriptive model. I. I NTRODUCTION This paper proposes a flexible scene understanding model, which can describe human activity based on a well-known de- scriptor, and deal with uncertainly using probabilistic models. Human activity analysis can be categorized as context-free and context-based. In context-free based approaches the model is independent of scene parameters, and just rely on the features belong to the person. However in the reality, context-based features play very important role to analyse human activities. For instance, when a person going to reach a chair, we will realize that properly the person going to sit on the chair, not to sleep. As Delaitre et al, described in [6], since object detection is a widely studied topic in computer vision, analysing the relation between human movements and the existent object around, can produce valuable information for human daily activities. For instance, people have been learned the (most probable) normal activities when the person is reaching to a chair, thus people have a probabilities set of activities depend on the objects in the scene. The problems is, what level of human movements infor- mation might be useful, and then how a general framework can be defined for analysing any possibility of human-object interactions. For the mentioned aspect, from the low level information such as body parts motions to higher ones such as human interactions can be useful. Dealing with the men- tioned different information caused a complex model. Thus, an hierarchical framework was used to reduce the complexity of the model [1] to provide different level of human activity analysis [11]. The relationship probability distributions between human motions and human-object based information, can be mod- elled, by given the possible activities and the interested objects in a scene. Laban Movement Analysis (LMA) system which consists of several components, is used to define proper human motions (Effort, Shape) [13], [12]and human-scene relations (Relationship) [16], [10] variables. Gupta et al. in [9] tackled the problem based on the 2D images. Thus they focused more on the computer vision problems for the mentioned applications, and just used the person hand trajectory infor- mation to analyse human-object interactions (reaching and manipulation). Their mentioned Bayesian model can not deal easily with the extension of the work for other activities. Thus we proposed the hierarchical model to deal with the problem, and to avoid the limitation of the 2D-based analysis, we used a motion tracker suit (MVN ® ) with several inertial sensor attached on the different body parts to have 3D pose of human body parts with maximum 120 frames per second resolution. However there are several works using 3D-based human movement analysis with high accuracy [14], [4], and also in 3D virtual applications [7], but only focused on classifying simple human movements. This paper is organized as following; Sec. II presents the feature extraction methods, and then based on that, the hierarchy-based human activity modelling is presented in Sec. III. Experimental results presented and discussed in Sec. IV, and Sec. V closes the paper with a conclusion and an outlook for future works. II. FEATURE CATEGORIZATION AND EXTRACTION USING LMA Body parts trajectories during human activities and the relationship between human and interested objects in the scene, are the input data of this study. A motion tracker suit is used to obtain the 3D human body parts positions 2013 IEEE International Conference on Systems, Man, and Cybernetics 978-1-4799-0652-9/13 $31.00 © 2013 IEEE DOI 2905 2013 IEEE International Conference on Systems, Man, and Cybernetics 978-1-4799-0652-9/13 $31.00 © 2013 IEEE DOI 2905 2013 IEEE International Conference on Systems, Man, and Cybernetics 978-1-4799-0652-9/13 $31.00 © 2013 IEEE DOI 10.1109/SMC.2013.494 2899