ISSN (Print): 2320-9798 ISSN(Online): 2320-9801 International Journal of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization) Vol. 1, Issue 9, November 2013 Copyright to IJIRCCE www.ijircce.com 2111 LIP MOTION SYNTHESIS USING PRINCIPAL COMPONENT ANALYSIS Disha George 1 , Yogesh Rathore 2 P.G. Student, Department of Computer Science & Engineering, RITEE, Raipur, Chhattisgarh, India 1 Sr.Lecturer, Department of Computer Science & Engineering, RITEE, Raipur, Chhattisgarh, India 2 Abstract: Current studies states that not only audio but also video signs are delivering information on speech recognition. This feature can be used as a supplementary in the field of animation and lip motion reading for the enhancement of speech recognition. It has gained a wide attention in audio-visual speech recognition (AVSR) due to its potential applications. This research is divided into two-phases: (i) Firstly, taking frames and extracting features to be kept in database as standard. (ii) Secondly, having test image samples to be trained in neural network to check the alphabet spoken by recognizing what the person has spoke. Lip reading system has been developed using Principal Component Analysis using input images and 60% success has been achieved in the test phase in similar alphabets lip movements (such as u, o, q, b, e, i, l, n etc.). Keywords: Lip reading, Eigen values and Eigen vectors, Principal Component Analysis, Neural Network. I. INTRODUCTION Lip-reading has been practised over many years to teach dumb and deaf pupils or persons to recognize what the front person has spoken and communicate in a effective way with other people[1].Extracting the audio bits of information of the speaker is itself a tedious task and brings complexity whereas visual signs also play a major role as supplementary to convey information as what does speaker says[2].Lip-reading is generally examined in two context: that is speech recognition and analysis of visual signs[8].Systems relied on audio waves are not preferable since signs are affected by different noises drastically. It is found that systems using both audio and video information are best option for healthy communication [2].Visual information constitutes 1/3 of the conveyed message [5, 6, 7].Some audio can be mixed with the environmental noises and they can be differentiated in visual space [4].In this research, videos of different subject has been taken and from which frames has been selected to conduct processing. The video corpus is converted into images of different alphabets over which the further work is performed. These images are selected manually by visualizing the changes in the frames during transitions [1].According to the work; there will be difference in the lip movements pronouncing different alphabets or letters of English even in similar sounding letters. This will include various parameters such as height between the inner lips, outer lips and similarly width between the inner lips[2].The vertical and horizontal distance between the lips vary considering the close approximations between similar pronouncing letters[1].Based on this research, creation of the database of commonly used alphabets is done and our neural network can be trained to find the best match between the input images and test sample images to find the letter with the closest proximity by its intelligent approach. The feature extraction is done from lip shape to form feature vector by Principal Component Analysis (PCA) to get the feature points in the database of various subject of speaking. A database is generated having feature points of various subjects. This database is compared with the test images to find the closest proximity among them, so that the image can be depicted what the person has spoken. Here, Neural Network RBFN (Radial Basis Function Network) is used in training purpose to get the best matched lip shape with the spoken alphabet to be identified [10]. II. LIP READING SYTEM Lip reading system constitutes an operation which helps to understand what speaker has spoken without the requirement of audio information. It comprises of complex computational processes. The proposed system is subdivided into sub modules and each sub module is processed and analysed separately to collect every bit of information deeply.