Creating Personalized Head Models from Image Series Denis Ivanov, Victor Lempitsky, Anton Shokurov, Andrey Khropov and Yevgeniy Kuzmin Department of Mathematics and Mechanics, Moscow State University Moscow, Russia Abstract This paper overviews the technologies aiming at creating personalized head models by calibrating generic model of an average head. The models can be created from two photographs taken in front and profile directions, a collection of photographs taken in arbitrary directions and a video stream showing continuous rotation of a head from one profile to the other. The calibration process is designed as a pipeline of sequential stages that register available views in a framework associated with the head, adjust the geometry of a model in order to obtain precise matching with the input data, and generate a consistent texture by merging the texture candidates obtained through inverse texture mapping. The visual quality of the obtained models depends on the spatial resolution of a generic mesh and the resolution of initial images, which allows the proposed calibration pipeline to be used in a wide range of applications including computer games, film production and video conferencing. Keywords: head model, model calibration, model adaptation, texture, video processing, camera registration 1. INTRODUCTION Animated models of a human head are demanded in a large variety of modern applications, including computer games, film production, and video conferencing. However, the problem of the effortless generation of a realistic looking, high quality model has been one of the most difficult in computer graphics, as no general, complete and efficient solution seems to be available yet. On the one hand, being a solid object in a 3D world, the human head can be digitized using commercially available 3D scanning machinery based on laser range finders [14,16] or similar technologies. This approach allows the generation of a relatively accurate shape of the model and accompanying texture; however, the produced data are not directly suitable for animation, and proper adaptation usually requires a great deal of an effort. On the other hand, using a priori knowledge about the underlying structure of a head enables one to obtain a better result. This idea is typically realized in the calibration strategies where an existing model of a generic head is adjusted so that it matches the available input data [7,10,13,17] or an existing set of models is considered as the formal basis in some vector space [2]. The resulting model is usually more suitable for animation, as it is based on a priori knowledge about the human head. Head model calibration methods can be classified by the source data, which they are based upon. As depth information is expensive to obtain, raster images are usually considered as the only input. Thus, there exist techniques of calibration from one image [2], a pair of orthogonal images [10,11], and a sequence of images or a video stream [4,6,17]. It is also important whether a special setup of the camera, lighting or viewing parameters is required, and having no special restrictions is obviously preferable. The other property that characterizes calibration methods is the amount of the user participation in the process. Some methods are fully automatic [2,7], some require several points be selected on images [4,6], while others demand much additional effort from the user [11,12,17]. On average, manual input of some parameters allows a model of better quality to be produced, while automatic techniques are obviously more preferable from the customer’s point of view. The goal of our research was to develop a complete calibration pipeline that would allow for the adjustment of a polygonal model of a generic head based on a set of photographs. Textured polygonal models were taken into consideration because there exist several solutions for their animation [5,8,9] that take advantage of modern GPU features. We have started our research by considering two orthogonal views of front and profile as input data, which conceptually follows [10,11]. However, our analysis ended up with the conclusion that these two images with no other supporting data, such as, for example, images taken from other angles or a database of head models for reference, cannot provide realistic model for high-resolution rendering. Our further research was focused on processing of several images, typically from 5 to 10, taken at different viewing angles by a still digital camera. Such approach allowed for reconstructing models of better quality in terms of their perceptual similarity compared to the real objects. However, we faced the necessity of selecting more facial features on the images, which might not be amenable to automatic selection. In order to provide further improvement of resulting model quality, more suitable environment for facial feature detection and tracking (due to estimated temporal smoothness of head movements on video) and more user-friendly processing we are currently working on fully automatic creation of a complete personalized 3D model of a head from a video stream taken by consumer digital video camera. 2. PIPELINE OVERVIEW The head calibration process is organized as a pipeline comprising a sequence of separate stages, which are schematically shown in Figure 2. Each stage processes the data obtained from its predecessors and performs the corresponding operations. The input data of the calibration process can be either • A pair of images taken at front and profile directions, or • A collection of images taken at arbitrary directions, or • A video stream having front and both profile views. If only two images are considered as a source data, they should be taken from viewing angles close to the front and profile ones. However, required level accuracy is quite reasonable in the sense that no special setup (or equipment) is generally required to position the head. Images are taken by still camera from the directions, which are identified by the photographer as front and profile. In case of a collection of images, they can generally be taken from any arbitrary positions. However, in practice, such approach allows for getting advantage of it if at least 5 images are present: front, two profiles and 2 images oriented at 45 degrees between International Conference Graphicon 2003, Moscow, Russia, http://www.graphicon.ru/