Web-based vs. Controled Environment: About the Reliability of Stimuli Ratings in Human-Robot Interaction Karoline Malchus 2,1 and Oliver Damm 1 and Petra Jaecks 2 and Prisca Stenneken 3 and Britta Wrede 1 Abstract— In several research areas, e.g. in the ﬁeld of human-robot interaction, ratings or questionnaires are applied using ofﬂine and online methods. An argument for the use of online methods is the efﬁciency. By using the Internet, data can be collected much faster than in an ofﬂine experiment and the administration effort is very low. The goal of our study was to ﬁnd out, if there is a difference in accuracy between an online and an ofﬂine rating task of human and robot emotional facial expressions. Results indicate, that emotional expressions are best recognized in humans (versus robots) and in the ofﬂine (versus online) condition. Furthermore, the inﬂuence of the emotional category on the accuracy rate varies between conditions. Therefore, we discuss environmental factors of online experiments that are difﬁcult to control as main reasons for these results. We conclude that online rating studies should always be combined with more reliable ofﬂine evaluations. I. I NTRODUCTION Over the last years, the world wide web (WWW) has become more and more important in our daily life. Regularly, we use it to read news, for shopping or for chatting. So it is not surprising that scientists use online methods for collecting data [1]. One reason is that online methods provide access to sample sizes far beyond the reach of standard ofﬂine methods. Efﬁciency is another important argument for the application of online methods. By using the Internet, data can be collected much faster and with low administration effort. There are several study designs for which online methods can be used, e.g. for questionnaires or ratings. The options are manifold and range from a simple presentation of text passages to the integration of picture and video stimuli in more interactive settings. In this study, an internet browser was used to show short videos. The participants were asked to evaluate the presented video according to the displayed/expressed emotion. The details of the setting are described in section II. Experimental comparisons of computer- versus paper- based tasks have been done quite often. In the early 1980s, one of the ﬁrst empirical studies compared reading tasks done on a CRT-display or on a paper. In these early tests, 1 Faculty of Technology, Bielefeld University, 33106 Bielefeld, Germany {odamm, kmalchus, bwrede}@techfak.uni-bielefeld.de 2 Faculty of Linguistics, Bielefeld University, 33106 Bielefeld, Germany {petra.jaecks | karoline.malchus}@uni-bielefeld.de 3 Faculty of Human Sciences, University of Cologne D-50923 K¨ oln, Germany prisca.stenneken@uni-koeln.de Fig. 1. An example of the stimuli faces presented during the study. differences in accuracy [3] and reading rate were detected [4]. W¨ astlund et al. [5] investigated in 2005 the effect of the presentation format (video display terminals versus paper presentation) on text comprehension and production. The authors did not only ﬁnd signiﬁcantly more correct responses in the paper condition, additionally, there was a signiﬁcantly greater information content in the following text production. While there is large evidence for differences in text process- ing in computer based versus paper based tasks, there is very few work on the online versus ofﬂine evaluation of video or audio data or emotion processing, e.g. comparing online and ofﬂine ratings of stimuli data. One ﬁeld, where online ratings are a frequently used method, is the perception and recognition of emotional expressions. In social interactions the communication of emotions is essential. This applies to human-robot communi- cation (HRI) in the same way as to human-human interaction (HHI). Thereby faces are widely used as stimuli. To study emotional facial expressions in a lively and natural manner, dynamic video sequences are often used instead of static photographs. Though there are many databases of human expressions (static as well as dynamic stimuli), there is a lack of databases of emotional expressions presented by a robot or virtual agent. Therefore researchers have to create new stimuli, which need to be rated by a large number of persons. II. METHOD In order to test the inﬂuence of a highly controlled environment versus an online rating situation, we have designed the following setting. In an emotion stimuli evaluation task, one group of participants did their ratings at a computer in a laboratory of the university, whereas another group of participants did their ratings online at home. For the experiment, a total of 110 different short video sequences were created (60 human / 50 robot) with a length of 4 seconds each. There are 12 acting persons and 5 versions 2013 IEEE RO-MAN: The 22nd IEEE International Symposium on Robot and Human Interactive Communication Gyeongju, Korea, August 26-29, 2013 TuA1.2P.13 978-1-4799-0509-6/13/$31.00 ©2013 IEEE 322