Linguistically Valid Movement Behavior Measured Non-Invasively Adriano V. Barbosa 1 , Hani C. Yehia 2 , Eric Vatikiotis-Bateson 1 1 Department of Linguistics, University of British Columbia, Vancouver, Canada 2 Department of Electronics, Federal University of Minas Gerais, Belo Horizonte, Brazil adriano.vilela@gmail.com, hani@cefala.org, evb@interchange.ubc.ca Abstract We use optical ﬂow to extract reliable kinematics from video for motions of the head, face, torso, and hands during speech and musical performance. Unlike dot- and marker- based mea- sures, these markerless measures are non-invasive and require no a priori speciﬁcation of measurement locations. Reliability is compared with marker tracking data and the method’s utility is demonstrated for data from Plains Cree, English, and Shona. Index Terms: optical ﬂow, kinematics, non-invasive measures. 1. Overview Since the mid-1990’s [1], we have been keen to develop video- based tools for measuring spoken communication that would be computationally tractable, reliable, non-invasive, and not re- stricted to laboratory recording equipment and conditions. At that time, digital image processing was cumbersome and expen- sive, and everyone thought that video images had to be of the highest resolution possible in order to withstand ﬁne-grained analysis (e.g., [2]). The technology has improved dramatically and we now know that the visible attributes of spoken commu- nication tend to be ubiquitous, simple (e.g., linear), and accessi- ble to perceivers at surprisingly low temporal and spatial resolu- tions [3, 4, 5]. Given these technical and conceptual advances, the time is ripe for video-based motion analysis tools that can be applied to inexpensively acquired video data. In this paper, we describe our method for measuring motion from optical ﬂow, test its reliability against 2D ﬂesh point mea- sures [6], and provide sample applications to linguistic perfor- mances. The tool, which is part of a larger Matlab toolbox for multimodal data analysis [7], is freely available to the research community and has already proved useful in the analysis of the coordination between speech acoustics, head and face motion, and manual gestures [8]. The text is organized as follows. Section 2 brieﬂy describes the optical ﬂow algorithm and introduces the concept of regions of interest, which are used in order to reduce the high dimen- sionality of the optical ﬂow signal. The data acquisition and processing procedures are presented in Section 3. Results are discussed in Section 4, where motion measures derived from the optical ﬂow ﬁeld are presented and compared with those obtained through a video-based marker tracking algorithm for the same data. Lastly, the summary is presented in Section 5. 2. Extracting motion measures from video The ﬁrst stage in this process is to get the video data stored in ﬁles on the computer. This can be done in one step by recording video directly to the computer via hardware capture, or in two steps by ﬁrst recording to tape. Once stored on the computer, the movie ﬁles can be accessed as image sequences. The video data acquired in an experiment consist of an im- age sequence, in which each image (or frame) can be treated as an array of dimension [M × N × 3], where M and N are the number of rows and columns in the image, respectively. For example, a standard deﬁnition (SDTV) frame of NTSC digi- tal video is 480 pixels high by 640 pixels wide (M = 480 and N = 640). Color images are composed of three chan- nels, though the contents of each individual channel depend on the color space being used. For example, in an RGB color space, the three channels represent the amount of red (R), green (G) and blue (B) in the image, respectively. Other representa- tions use one luminance (with brightness information) and two chrominance (with color information) channels. One advantage of such a representation is that a grayscale version of the im- age can be easily obtained by simply discarding the two color channels, which results in a M × N matrix. Optical ﬂow is a common technique for extracting measures of 2D motion from video. Although there are many algorithms for computing optical ﬂow [9], they all have the same goal of calculating optical ﬂow ﬁelds corresponding to the projection of the 3D motion of real objects onto the 2D image plane. Roughly speaking, after conversion to grayscale, the algorithm compares consecutive frames of the video sequence and calculates how much and in which direction each pixel in the image moved from one frame to the next. The algorithm then assigns to each pixel a displacement vector corresponding to the difference in the pixel position across the two frames. The array of displace- ment vectors comprises the optical ﬂow ﬁeld. The distance in time between consecutive frames (given by T =1/f , where f is the video frame rate) can be used as a unit of discrete time. Thus, in discrete time, the pixel displacements that comprise the optical ﬂow ﬁeld can be seen as pixel veloci- ties. Therefore, the optical ﬂow at the point (x, y) in the image plane at the discrete time k can be denoted by -→ v (x, y, k)=[vx(x, y, k),vy (x, y, k)] , (1) where vx(x, y, k) and vy (x, y, k) are the x and y components of the optical ﬂow vector, respectively. The optical ﬂow analysis results in a high dimensional sig- nal. For example, in the case of digital NTSC video, there are 640 × 480 two-dimensional vectors associated with every pair of consecutive frames in the video. There are many ways to reduce the dimensionality of the optical ﬂow analysis that will make the resulting signal more tractable. For example, the image sequence may be ﬁltered on input to reduce the resolu- tion of the analysis, or pixel arrays may be sparsely sampled (e.g., 1 out of every 4 pixels) for analysis. After ﬂow com- putation, the dimensionality of the result may be reduced by Principal Component Analysis (PCA). We have chosen to leave the input sequence alone, but to reduce the dimensionality of Accepted after peer review of abstract paper Copyright  2008 AVISA 26 - 29 September 2008, Moreton Island, Australia 173