CHAPTER  Introduction Recent work in human-computer interaction (HCI) and human-robot interaction (HRI) has shown that embodied agents and robots are increasingly being studied as partners that collaborate and do things with people (Breazeal, 2009; Schroeder et al., 2012). For example, the use of embodied agents and robots is being investigated in many HCI and HRI applica- tions, such as providing assistance for the elderly at home, serving as tutors for children by enriching their learning experiences, and acting as therapeutic tools or as game buddies for entertainment purposes. hese applications require embodied agents and robots to be endowed with social skills. Social per- ception abilities include afect sensitivity—that is, the ability to recognise people’s afective expressions and states, understand their social signals—and account for the context in which the interaction takes place (Castellano et al., 2010a). Afect sensi- tive embodied agents and robots are more likely to be able to engage with human users over extended periods of time as compared with their nonafective counterparts (Bickmore & Picard, 2005). Research on automatic afect recognition has contributed several studies on the design of systems capable of perceiving multimodal social, cognitive, and afective cues (e.g., facial expressions, eye gaze, body movement, physiological data, etc.) and using them to infer a user’s afective and cognitive state (Calvo & D’Mello, 2010; Zeng et al., 2009). Recently there has been a shift toward real-world HCI and HRI, which has led to the emergence of new trends in multimodal afect recognition. hese include, among others, an increased focus on the automatic recognition of spontaneous and nonprototypical afective states, the development of techniques for continuous afect prediction— which allows for the dynamics of afective states to be taken into consideration, and the design of context-sensitive afect recognition systems. Compared with systems based on a single modal- ity, multimodal afect recognition has the potential Abstract 7KLV FKDSWHU SURYLGHV D V\QWKHVLV RI UHVHDUFK RQ PXOWLPRGDO DIIHFW UHFRJQLWLRQ DQG GLVFXVVHV PHWKRGRORJLFDO FRQVLGHUDWLRQV DQG FKDOOHQJHV DULVLQJ IURP WKH GHVLJQ RI D PXOWLPRGDO DIIHFW UHFRJQLWLRQ V\VWHP IRU QDWXUDOLVWLF KXPDQFRPSXWHU DQG KXPDQURERW LQWHUDFWLRQV ,GHQWLタHG FKDOOHQJHV LQFOXGH WKH FROOHFWLRQ DQG DQQRWDWLRQ RI VSRQWDQHRXV DIIHFWLYH H[SUHVVLRQV WKH FKRLFH RI DSSURSULDWH PHWKRGV IRU IHDWXUH UHSUHVHQWDWLRQ DQG VHOHFWLRQ LQ D PXOWLPRGDO FRQWH[W DQG WKH QHHG IRU FRQWH[W VHQVLWLYLW\ DQG IRU FODVVLタFDWLRQ VFKHPHV WKDW WDNH LQWR DFFRXQW WKH G\QDPLF QDWXUH RI DIIHFW DQG WKH UHODWLRQVKLS EHWZHHQ GLIIHUHQW PRGDOLWLHV )LQDOO\ WZR H[DPSOHV RI PXOWLPRGDO DIIHFW UHFRJQLWLRQ V\VWHPV XVHG LQ VRIW UHDOWLPH QDWXUDOLVWLF KXPDQFRPSXWHU DQG KXPDQURERW LQWHUDFWLRQ IUDPHZRUNV DUH SUHVHQWHG Key Words: PXOWLPRGDO DIIHFW UHFRJQLWLRQ IHDWXUH UHSUHVHQWDWLRQ DQG VHOHFWLRQ FRQWH[W VHQVLWLYLW\ KXPDQFRPSXWHU LQWHUDFWLRQ KXPDQURERW LQWHUDFWLRQ *LQHYUD &DVWHOODQR +DWLFH *XQHV &KULVWRSKHU 3HWHUV and %M|UQ 6FKXOOHU Multimodal Afect Recognition for Naturalistic Human-Computer and Human-Robot Interactions 17 OUP UNCORRECTED PROOF – FIRSTPROOFS, Fri Jul 18 2014, NEWGEN book.indb 246 7/18/2014 1:19:47 PM