Test persons for subjective video quality testing: Experts or non-experts? Matej Nezveda, Shelley Buchinger, Werner Robitza, Ewald Hotop, Patrik Hummelbrunner and Helmut Hlavacs Department of Distributed and Multimedia Systems University of Vienna {matej.nezveda, shelley.buchinger, werner.robitza, ewald.hotop, patrik.hummelbrunner and helmut.hlavacs}@univie.ac.at Vittorio Baroncini, Cristina Delogu Fondazione Ugo Bordoni vittorio@fub.it, cristina@fub.it ABSTRACT Some time ago it has been understood that subjective qual- ity assessment is crucial to determine the multimedia quality of distorted or compressed data. Generally, researchers fol- low the ITU recommendations to carry out their user tests. There, it is specified that at least fifteen non-experts need to act as test-persons. It seems that in many opportunities such a large number of test persons is not required to ob- tain a significant result. In this paper, we investigated if a smaller number of experts could be used instead of the large group of non-experts by repeating already performed experiments with the new setting. Tests results reveal that for an adequate choice of the con- tent, the rating scores obtained by non-experts can be ap- proximated very precisely by ratings performed with a fewer number of experts. For an arbitrary selection of the content, the followed approach seems to be useful to detect and var- ify trends. Furthermore it has been shown that a certain number of experts needs to be used. Mean opinion scores of several non-experts can not be approximated by ratings performed by only one expert. Categories and Subject Descriptors H.1.2 [User/Machine Systems]: Human Factors Keywords Subjective quality assessment, video quality 1. INTRODUCTION Quality of Experience has been defined by the ITU-T Study group 12 and pre-published in 2008 in the ITU-T Recom- mendation G.1080 [6] to be “Quality of experience (QoE) is the overall acceptability of an application or service, as per- ceived subjectively by the end user.” This means that the user plays a crucial role when aiming at assessing Quality of Experience. In contrary to computational methods, user tests have the disadvantage that they comprise a statistical error. It is not possible to obtain the same result at each test run. In order to provide comparable results, methods on how to assess subjective audio, video and multimedia quality have been defined some time ago [4, 5, 2, 3]. One of the properties that have been defined in these rec- ommendations consists in involving at least 15 non-experts to act as test persons. The effort and costs of such user tests are high and might exceed the needs and the budget of some investigations. There might be some or even sev- eral occasions where a smaller number of test users would be sufficient or where the large number of non-experts could be replaced by a small number of video quality experts. In fact, data analysis carried out in [10] revealed that the values of only ten non-experts are sufficient in several cases to obtain a significant and reliable result. In this paper we investigate if the large number of non- experts that is required by [4] could be replaced by a smaller number of non experts. For that purpose we repeated the user tests published in [9, 8] with six experts only in a sim- ilar setting following the remaining specifications indicated in [4]. More details on the experimental set up is provided in Section 2, test results are described and discussed in Sec- tion 3 and conclusions are drawn in Section 4. 2. EXPERIMENT We concluded a Single Stimulus Continuous Quality Evalu- ation (SSCQE) experiment based on [4], in which each video was shown only once to each expert observer. All observers have experience in the field of video quality. The goal of this experiment was to assess whether a set of experts could achieve the same results as a larger group of non-experts. The videos that were used for playback originated from the LIVE database [9, 8]. It provides a set of 150 videos and the