Web-based vs. Controled Environment:
About the Reliability of Stimuli Ratings in Human-Robot Interaction
Karoline Malchus
2,1
and Oliver Damm
1
and Petra Jaecks
2
and
Prisca Stenneken
3
and Britta Wrede
1
Abstract— In several research areas, e.g. in the field of
human-robot interaction, ratings or questionnaires are applied
using offline and online methods. An argument for the use of
online methods is the efficiency. By using the Internet, data
can be collected much faster than in an offline experiment and
the administration effort is very low. The goal of our study
was to find out, if there is a difference in accuracy between an
online and an offline rating task of human and robot emotional
facial expressions. Results indicate, that emotional expressions
are best recognized in humans (versus robots) and in the
offline (versus online) condition. Furthermore, the influence of
the emotional category on the accuracy rate varies between
conditions. Therefore, we discuss environmental factors of
online experiments that are difficult to control as main reasons
for these results. We conclude that online rating studies should
always be combined with more reliable offline evaluations.
I. I NTRODUCTION
Over the last years, the world wide web (WWW)
has become more and more important in our daily life.
Regularly, we use it to read news, for shopping or for
chatting. So it is not surprising that scientists use online
methods for collecting data [1]. One reason is that online
methods provide access to sample sizes far beyond the reach
of standard offline methods. Efficiency is another important
argument for the application of online methods. By using
the Internet, data can be collected much faster and with low
administration effort.
There are several study designs for which online methods
can be used, e.g. for questionnaires or ratings. The options
are manifold and range from a simple presentation of text
passages to the integration of picture and video stimuli in
more interactive settings. In this study, an internet browser
was used to show short videos. The participants were
asked to evaluate the presented video according to the
displayed/expressed emotion. The details of the setting are
described in section II.
Experimental comparisons of computer- versus paper-
based tasks have been done quite often. In the early 1980s,
one of the first empirical studies compared reading tasks
done on a CRT-display or on a paper. In these early tests,
1
Faculty of Technology, Bielefeld University,
33106 Bielefeld, Germany {odamm, kmalchus,
bwrede}@techfak.uni-bielefeld.de
2
Faculty of Linguistics, Bielefeld University,
33106 Bielefeld, Germany {petra.jaecks |
karoline.malchus}@uni-bielefeld.de
3
Faculty of Human Sciences, University of Cologne D-50923 K¨ oln,
Germany prisca.stenneken@uni-koeln.de
Fig. 1. An example of the stimuli faces presented during the study.
differences in accuracy [3] and reading rate were detected
[4]. W¨ astlund et al. [5] investigated in 2005 the effect of
the presentation format (video display terminals versus paper
presentation) on text comprehension and production. The
authors did not only find significantly more correct responses
in the paper condition, additionally, there was a significantly
greater information content in the following text production.
While there is large evidence for differences in text process-
ing in computer based versus paper based tasks, there is very
few work on the online versus offline evaluation of video or
audio data or emotion processing, e.g. comparing online and
offline ratings of stimuli data.
One field, where online ratings are a frequently used
method, is the perception and recognition of emotional
expressions. In social interactions the communication of
emotions is essential. This applies to human-robot communi-
cation (HRI) in the same way as to human-human interaction
(HHI). Thereby faces are widely used as stimuli. To study
emotional facial expressions in a lively and natural manner,
dynamic video sequences are often used instead of static
photographs. Though there are many databases of human
expressions (static as well as dynamic stimuli), there is a
lack of databases of emotional expressions presented by a
robot or virtual agent. Therefore researchers have to create
new stimuli, which need to be rated by a large number of
persons.
II. METHOD
In order to test the influence of a highly controlled
environment versus an online rating situation, we have
designed the following setting. In an emotion stimuli
evaluation task, one group of participants did their ratings
at a computer in a laboratory of the university, whereas
another group of participants did their ratings online at home.
For the experiment, a total of 110 different short video
sequences were created (60 human / 50 robot) with a length
of 4 seconds each. There are 12 acting persons and 5 versions
2013 IEEE RO-MAN: The 22nd IEEE International Symposium on
Robot and Human Interactive Communication
Gyeongju, Korea, August 26-29, 2013
TuA1.2P.13
978-1-4799-0509-6/13/$31.00 ©2013 IEEE 322