Finite Element Modeling of Facial Deformation in Videos for Computing Strain Pattern Vasant Manohar, Matthew Shreve, Dmitry Goldgof, and Sudeep Sarkar Computer Science & Engineering, University of South Florida {vmanohar, mshreve, goldgof, sarkar}@cse.usf.edu Abstract We present a nite element modeling based approach to compute strain patterns caused by facial deformation dur- ing expressions in videos. A sparse motion eld computed through a robust optical ow method drives the FE model. While the geometry of the model is generic, the material constants associated with an individual’s facial skin are learned at a coarse level sufcient for accurate strain map computation. Experimental results using the computational strategy presented in this paper emphasize the uniqueness and stability of strain maps across adverse data conditions (shadow lighting and face camouage) making it a promis- ing feature for image analysis tasks that can benet from such auxiliary information. 1. Introduction Deformable modeling of facial soft tissues have found use in application domains such as human-machine inter- action for facial expression recognition [6]. More recently, such modeling techniques have been used for tasks like age estimation [9] and person identication [10, 11, 15]. Ex- isting modeling approaches can be divided into two ma- jor groups. Models based on solving continuum mechan- ics problems under consideration of material properties and other physical constraints are called physical models. All other modeling techniques, even if they are related to mathematical physics, are known as non-physical models. Though physical models provide a highly accurate and ro- bust solution strategy, the major problem with such ap- proaches is that: (i) the observed physical phenomena can be very complex and (ii) solving the underlying partial dif- ferential equations (PDEs) requires substantial computa- tional cost. The answers to these questions lie in: (i) nding an adequate simplied model of the given problem covering the essential observations and (ii) applying efcient numer- ical techniques for solving the PDEs. In this work, we use the strain pattern extracted from non-rigid facial motion as a simplied and adequate way to characterize the underlying material properties of facial soft tissues. The proposed method has several unique features: (i) Strain is related to the biomechanical properties of fa- cial tissues that are unique for each individual; (ii) Strain pattern of the face is less sensitive to illumination differ- ences (between registered and query sequences) and face camouage because it remains stable as long as reliable fa- cial deformations are captured; (iii) A nite element mod- eling based method enforces regularization which mitigates issues related to automatic motion estimation. Therefore, the computational strategy is accurate and robust; (iv) Im- ages or videos of facial deformations can be acquired with a regular video camera and no special imaging equipment is needed. Existing work on face animation and recognition using a highly accurate model take into account anatomical de- tails of a face, such as bones, musculature, and skin tis- sues [12, 13, 16]. However, a major challenge of using a sophisticated anatomy-based model is the high compu- tational complexity involved. An alternative is to extract biomechanical information (that might be adequate for cer- tain tasks) from images and videos without building a full- scale model. Essa and Pentland [6] developed a nite el- ement model to estimate visual muscle activations and to generate motion-energy templates for expression analysis. However, automatic identication of action units that es- timate the muscle activations is still a topic of open re- search. In our approach, which is also based on biome- chanics, we go a step further by quantifying the soft tissue properties through its elasticity and effectively representing it by means of strain maps. The study of facial strain requires high quality motion data generated by robust tracking methods, an extensively investigated subject in computer vision. The trend is to in- tegrate various image cues and prior knowledge into a face model [2, 5]. Such methods rely on a certain degree of user intervention, for either model initialization or tracking guid- ance. On the other hand, methods that avoid the use of hand-labeled features and manual correspondence [1, 14] required an extensive collection of training samples which make them less scalable. Therefore, in this study, we adopt an algorithm in its basic form – a robust optical ow method. Thus, the focus of this paper is on developing a robust 1 978-1-4244-2175-6/08/$25.00 ©2008 IEEE Authorized licensed use limited to: University of South Florida. Downloaded on February 12, 2009 at 17:54 from IEEE Xplore. Restrictions apply.