Tongue Appearance Modeling and Tracking in Ultrasound Images Anastasios Roussos, Athanassios Katsamanis and Petros Maragos School of ECE, National Technical University of Athens, Greece E-mails: {troussos,nkatsam,maragos}@cs.ntua.gr The shape and dynamics of the human tongue during speech are crucial in the analysis and modeling of the speech production system. Currently, ul- trasound (US) imaging is one of the most convenient ways to acquire such information. Since even a few minutes of recorded speech correspond to tens of thousands of US frames, automatic extraction of the tongue contour at every time instant can be signiﬁcantly helpful. This is a quite diﬃcult problem, given that the US images contain high amounts of speckle noise, some parts of the tongue contour are not visible and the remaining parts are only weakly visible. We are working on a novel approach to tackle the automatic tongue tracking problem building on Active Appearance Models. Few methods addressing this problem are reported in the literature. Li et al. [3] developed the EdgeTrak, a publicly available semi-automatic system for tongue tracking in US videos. It is based on a Snake model that is designed for this application and incorporates information on edge gradient, intensity and contour orientation. It works quite well in frame subsequences where the same part of the tongue is visible, but when a part disappears, the corresponding tracked contour is erroneous and the method cannot afterwards recover from such errors. Therefore, this system often needs manual reﬁnements. More recently, Aron et al. [1] introduced various improvements. Their method is also based on Snakes, but preprocesses the US frames to enhance the tongue visibility and poses boundary constraints on the snake to prevent it from shrink- ing. In addition, the contour is initialized in every frame using the information from the optical ﬂow between two consecutive frames and two electromagnetic (EM) sensors that are glued on the tongue. This method, which we refer to as Constrained Snakes, has been reported to be more accurate than the Edgetrak system. On the other hand, it also needs manual reﬁnements, though less often than Edgetrak. We present a novel tracking method that incorporates prior information about the shape variation of the contour of the tongue. This method is robust even in cases of bad tongue visibility. Further, it not only extracts the visible tongue contour parts in every frame, but also extrapolates the contour in the nonvisible parts, thanks to the model of shape variation. The methodology of 1 Abstract submitted to Ultrafest V, 19 – 21 March 2010, New Haven, Connecticut, U.S.A.