PAD-based Multimodal Affective Fusion Stephen W. Gilroy University of Teesside Middlesbrough, UK s.w.gilroy@tees.ac.uk Marc Cavazza University of Teesside Middlesbrough, UK m.o.cavazza@tees.ac.uk Marcus Niiranen VTT Electronics Helsinki, Finland marcus.niiranen@vtt.fi Elisabeth Andr´ e University of Augsburg Augsburg, Germany andre@informatik.uni-augsburg.de Thurid Vogt University of Augsburg Augsburg, Germany thurid.vogt@informatik.uni-augsburg.de J´ erˆ ome Urbain Facult´ e Polytechnique de Mons Mons, Belgium jerome.urbain@fpms.ac.be Maurice Benayoun CiTu, Universit´ e Paris 1 Paris, France mb@benayoun.com Hartmut Seichter HITLabNZ, University of Canterbury Christchurch, New Zealand hartmut.seichter@hitlabnz.org Mark Billinghurst HITLabNZ, University of Canterbury Christchurch, New Zealand mark.billinghurst@hitlabnz.org Abstract The study of multimodality is comparatively less devel- oped for Affective interfaces than for their traditional coun- terparts. However, one condition for the successful devel- opment of Affective interface technologies is the develop- ment of frameworks for the real-time multimodal fusion. In this paper, we describe an approach to multimodal affec- tive fusion, which relies on a dimensional model, Pleasure- Arousal-Dominance (PAD) to support the fusion of affec- tive modalities, each input modality being represented as a PAD vector. We describe how this model supports both af- fective content fusion and temporal fusion within a uniﬁed approach. We report results from early user studies which conﬁrm the existence of a correlation between measured af- fective input and user temperament scores. 1. Introduction Affective expression in humans is naturally conveyed through multiple channels, and this has been used to make the recognition of emotional categories more robust and ac- curate in a variety of user interfaces [21, 25, 26, 30]. How- ever, this innate affective multimodal nature has not always been characterised in terms of the modalities themselves, deﬁned as input channels possessing their own semantics. This can be in part because most work on affective fusion has taken place in the context of early fusion, including the search for an “ideal” feature set across modalities [40, 35], or in the context of improved robustness and classiﬁca- tion within a pre-deﬁned set of affective semantics, usually based on universal emotion categories, derived from the se- mantics of facial expressions. [27, 42, 8]. In this paper, we describe an approach to the multimodal fusion of affective input based on the identiﬁcation of inter- action modalities. To support our study, we have designed an experimental platform utilising a compatible digital arts installation. Our starting postulate is that it is possible to analyse certain forms of spectator behaviour in terms of affective modalities, and that the overall reaction of the user to the installation can be described through the fusion of individ- ual modalities. Affective multimodality shows both commonalities and differences with “traditional”, information-based multi- modality ([5, 34, 41, 13]). One major similarity is the co- incidence of linguistic expression and “physical” expres- sion. In traditional multimodality, physical expression tends to consist mostly of deictic gestures and/or symbolic ges- 978-1-4244-4799-2/09/$25.00 c 2009 IEEE