CASIMIRO: A Robot Head for Human-Computer Interaction O. D´ eniz, M. Castrill´on, J. Lorenzo, C. Guerra, D. Hern´andez, M. Hern´andez * Universidad de Las Palmas de Gran Canaria Instituto Universitario de Sistemas Inteligentes y Aplicaciones Num´ ericas en Ingenier ´ ia Ediﬁcio Central del Parque Tecnol´ogico - Campus de Taﬁra 35017 Las Palmas - Spain E-mail: {odeniz,mcastrillon,jlorenzo,cguerra,dhernandez,mhernandez}@dis.ulpgc.es Abstract The physical appearance and behavior of a robot is an important asset in terms of Human-Computer In- teraction. Multimodality is also fundamental, as we humans usually expect to interact in a natural way with voice, gestures, etc. People approach complex in- teraction devices with stances similar to those used in their interaction with other people. In this pa- per we describe a robot head, currently under devel- opment, that aims to be a multimodal (vision, voice, gestures,...) perceptual user interface. Modules are de- scribed for face detection, tracking, facial movement, action selection and sound localization. Preliminary results indicate that the robot head can potentially achieve the goals we are interested in, namely human interaction and assistance. 1 Introduction A characteristic of our society is the introduction of the computer in daily life, but with devices that are not natural for human beings to interact with [15]. Users normally need a training period to make use of these devices, so in some cases it can appear a rejection to the use of computers due to the unnatural design of the communication devices. This is due to the fact that users must adapt to the computers instead of the opposite. Human beings are sociable by nature and use their sensorial and motor capabilities to commu- nicate with their environment; we communicate not only with words but with sounds and gestures. There- fore, if the man-machine interaction was more similar to the interaction among humans, the access to arti- ﬁcial devices would be higher and they would play a role as assistants. Perceptual User Interfaces (PUI) [23] is the * Work partially funded by DGUI-Gobierno de Canarias PI2000/042 research project. The ﬁrst author is supported by graduate grant D260/54066308-R of Universidad de Las Pal- mas de Gran Canaria. paradigm that explores the techniques used by the human beings to interact among them and with their environment. These techniques take into account the human capabilities to interact with the technology in order to model the man-machine interaction. This in- teraction must be multimodal because it is the most natural manner to interact with computers. Raisamo [17] gives a intuitive approach deﬁning a multimodal user interface when ”a system accepts many diﬀerent inputs that are combined in a meaningful way”. Thus, in a multimodal system the user interacts with several modalities like voice, gestures, sight, etc. So, mul- timodal interaction models the study of mechanisms that integrate modalities to improve the man-machine interaction. In this work we present the architecture and initial development of an experimental multimodal interface. The paper is organized as follows. In Section 2, the ar- chitecture of the whole system is described. The mod- ules that are being developed are described in Section 3. In Section 4 some preliminary results are shown. Finally, the main conclusions and future directions of this work will be presented. 2 CASIMIRO architecture In this section we describe CASIMIRO, an archi- tecture of a Perceptual User Interface which will make easier the interaction between people and computers. This architecture is based on the scheme of seman- tic fusion and the diﬀerent modes that compound the system are considered independent. To achieve this goal the interface has human-like behaviors, which are based on a humanoid head (Fig. 4) with facial move- ments that allows to add gestures as a mean of interac- tion. The perceptual side of the interface will make use of an active vision approach already used in the DE- SEO system [10]. Sounds and voice are also elements of the perceptual capabilities of the architecture. Casimiro is made up of ﬁve major modules (Figure