IPT-EGVE Symposium (2007) B. Fröhlich, R. Blach, and R. van Liere (Editors) Supervision of Task-Oriented Multimodal Rendering for VR Applications G. Bouyer, P. Bourdot and M. Ammi VENISE transversal action, LIMSI-CNRS, Université Paris-Sud, France Abstract This article addresses the question of integrating multimodal rendering in Virtual Reality applications. It exposes ﬁrst the interest of multimedia intelligent systems to improve human activity in Virtual Environments. Then it details the conception of a software module in charge of supervising multimodal information rendering, depending on the interaction and its context. From existing psychophysical studies and concrete applications, we propose a model, an architecture and a decision process. Finally a ﬁrst implementation is presented to validate the core of the simulator and show the adaptability of its knowledge base. Categories and Subject Descriptors (according to ACM CCS): D.2.2 [Software Engineering]: Design Tools and Techniques [User Interfaces] ; H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems [Artiﬁcial, augmented, and virtual realities] 1. Introduction Virtual Reality (VR) is a discipline which allows human users to perceive and manipulate numerical environments in an immersive, pseudo-natural and real-time way. Right from the start and the Sensorama Simulator of Morton Heilig in 1962, VR applications tried to combine multiple sensory stimulations. In the course of technological advances, VR systems have integrated new interaction interfaces (speech, gesture, visual, auditory, haptic, etc.) to make the most of all human sensori-motor capacities. These systems are called "multimedia", "multisensorial" or "multimodal". The last type is also referred to as "multimedia intelligent system". Such "multisensorial" systems have been developed on one hand to improve the realism of Virtual Environments (VE), and on the other hand to enhance speed, efﬁciency and comfort in numerous tasks. We call the last objective the "task-oriented" approach. This approach is based on the Modal Speciﬁc Theory [Fri74]. This theory states that each sensorial channel has a unique method of information pro- cessing and is suited for a certain sort of stimulus. For exam- ple, vision is a spatial channel, capable of interpreting spatial relationships. Audition is not really efﬁcient for 3D tasks, but is useful to perceive spatial information that are not located in the ﬁeld of vision. Moreover it is a very efﬁcient chan- nel for analysing temporal phenomena. Haptic sense is spe- cial as it requires active perception, i.e. movement, to give both temporal and spatial cues. Sensorial modalities are the various means of transmitting information through the chan- nels. So according to the MST theory, each modality has its own advantages and drawbacks and is appropriate to a cer- tain type of information. The main purpose of multisensorial systems is to ﬁnd the more efﬁcient and appropriate mapping between a piece of information and a modality (or a combi- nation of several modalities). However, the mapping rules between modalities and in- formation are not valid in all situations ; the rendering should depend not only on the data to be communicated but also on the context of the interaction (user, environment and system) [NC93]. To operate properly, "multimodal" systems should integrate an intelligent artiﬁcial manager to adapt multisen- sorial interaction to various users, to be ﬂexible to different contexts of activity. This manager is based on a knowledge representation of the human channels, of the operator objec- tives, of the rendering capacities of the technical architec- ture, etc. One of the main issues of VR multimodality is to model and develop such a system for various applications. c  The Eurographics Association 2007.