Diﬀerential video coding of face and gesture events in presentation videos Robin Tan, James W. Davis * Computer Vision Laboratory, Department of Computer Science and Engineering, Ohio State University, USA Received 14 March 2002; accepted 2 February 2004 Available online 7 August 2004 Abstract Currently, bandwidth limitations pose a major challenge for delivering high-quality multi- media information over the Internet to users. In this research, we aim to provide a better compression of presentation videos (e.g., lectures). The approach is based on the idea that people tend to pay more attention to the face and gesturing hands, and therefore these re- gions are given more resolution than the remaining image. Our method ﬁrst detects and tracks the face and hand regions using color-based segmentation and Kalman ﬁltering. Next, diﬀerent classes of natural hand gesture are recognized from the hand trajectories by iden- tifying gesture holds, position/velocity changes, and repetitive movements. The detected face/ hand regions and gesture events in the video are then encoded at higher resolution than the remaining lower-resolution background. We present results of the tracking and gesture rec- ognition approach, and evaluate and compare videos compressed with the proposed method to uniform compression. Ó 2004 Elsevier Inc. All rights reserved. 1. Introduction Initially, the Internet was mostly used to communicate and share textual forms of data. Today, the Internet includes a rich medley of multimedia audio-visual www.elsevier.com/locate/cviu Computer Vision and Image Understanding 96 (2004) 200–215 1077-3142/$ - see front matter Ó 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.cviu.2004.02.008 * Corresponding author. Fax: +1 614 292 2911. E-mail address: jwdavis@cse.ohio-state.edu (J.W. Davis).