Recognizing Hand Gestures using Dynamic Bayesian Network Heung-Il Suk , Bong-Kee Sin , and Seong-Whan Lee Department of Computer Science and Engineering, Korea University, Korea {hisuk, swlee}@image.korea.ac.kr Department of Computer Engineering, Pukyong National University, Korea bkshin@pknu.ac.kr Abstract In this paper, we describe a dynamic Bayesian network or DBN based approach to both two-hand gestures and one- hand gestures. Unlike wired glove-based approaches, the success of camera-based methods depends greatly on im- age processing and feature extraction results. So the pro- posed method of DBN-based inference is preceded by fail- safe steps of motion tracking. Then a new gesture recog- nition model for a set of both one-hand and two-hand ges- tures is proposed based on the dynamic Bayesian network framework which makes it easy to represent the relation- ship among features and incorporate new information to the model. In an experiment with ten isolated gestures, we obtained a recognition rate upwards of 99.59% with cross validation. The proposed model is believed to have a strong potential for successful applications to other related prob- lems such as sign languages. 1. Introduction A hand gesture can be described by a locus of hand mo- tion recorded in a sequence of frames. To model this kind of sequential input hidden Markov model(HMM) have been widely used in the field of speech recognition, computer vi- sion, and so on. Recently, there has been an increasing in- terest in a more general class of probabilistic models, called dynamic Bayesian network(DBN), which includes HMM and Kalman filter as special cases. The DBN is a general- ized version of the Bayesian network(BN) with an extension to temporal dimension. Du et al. defined five actions that could happen between two persons and developed a DBN-based model which took local features such as contour, moment, height and global features such as velocity, orientation, distance as observa- tions [6]. Park et al. employed a hierarchical Bayesian net- work to analyze the evolution of the poses of the multiple body parts and recognized the interactions between two per- sons [11]. Avil´ es-Arriaga et al. extracted the area and the center of a hand as the input features and used a n¨ aive DBN to recognize ten hand gestures [4]. Earlier, Pavlovic pro- posed the use of DBN for gesture recognition that can be seen as a combination of an HMM and a dynamic linear system [12]. On the other hand Le´ on et al. used a sliding window of 15 frames and represented the motion between contiguous frames by a random variable [8]. Last but not least, Nefina et al. compared several different methods of audio-visual speech recognition and suggested the use of coupled HMMs and factorial HMMs by showing that cou- pled HMMs outperformed all the other models in the per- formance of recognition [10]. In this paper, we describe a dynamic Bayesian network- based hand gesture recognition method that can be used to control a media player or PowerPoint TM . First, given a video sequence, it tries to track the moving hands using the method proposed by Argyros et al. [3] with some modi- fication. The proposed modification can track hand blobs robustly even though there is an overlap between a face and hands with varying velocities. Taking the motion of each hand and the relative positions between two hands and be- tween a face and two hands as observations, we proposed a new gesture model utilizing a DBN framework. In an ex- periment with ten isolated gestures, the model achieved a recognition rate of 99.59% with cross validation. In the rest of the paper, we will begin with describing the methods of tracking hands in Section 2. Section 3 defines ten hand gestures and covers topics on what kinds of fea- tures to use and how to extract them from an input video. The proposed use of DBN model and the inference algo- rithm are explained in Section 4 and the experimental re- sults are presented and analyzed in Section 5. Finally Sec- tion 6 concludes the paper. 2. Hands Tracking Successful dynamic hand gesture recognition requires accurate location of hands and face and to track them in space-time. The result of this step influences greatly on the performance of the target system. 978-1-4244-2154-1/08/$25.00 c 2008 IEEE