Image and Vision Computing 65 (2017) 49–57 Contents lists available at ScienceDirect Image and Vision Computing journal homepage: www.elsevier.com/locate/imavis Behavioral cues help predict impact of advertising on future sales Gábor Szirtes a, * , Javier Orozco a , István Petrás a , Dániel Szolgay a , Ákos Utasi a , Jeffrey F. Cohn b a Realeyes OÜ, Tölgyfa utca 24, Budapest 1027, Hungary b University of Pittsburgh, 4322 Sennott Square, Pittsburgh, PA 15260, USA ARTICLE INFO Article history: Received 1 May 2016 Received in revised form 7 March 2017 Accepted 13 March 2017 Available online 22 March 2017 Keywords: Market research Behavioral cue Predictive modeling Facial expression analysis ABSTRACT Advertising aims to influence consumer preferences, appraisals, action tendencies, and behavior in order to increase sales. These are all components of emotion. In the past, they have been measured through self- report or panel discussions. While informative, these approaches are difficult to scale to large numbers of consumers, fail to capture moment-to-moment changes in appraisals that may be predictive of sales, and depend on verbal mediation. We used web-cam technology to sample non-verbal responses to television commercials from four product categories in six different countries. For each participant, head pose, head motion, and more frequent facial expressions like smiling, surprise and disgust were automatically mea- sured at each video frame and aggregated across subjects. Dynamic features from the aggregated series were input to simple linear ensemble classifier with 10-fold cross-validation to predict product sales. Sales were predicted with ROC AUC = 0.75, 95% CI [0.727,0.773] and predictions for unseen categories were con- sistent for all, but one product groups (ROC AUC varies between 0.74 and 0.83, except for Confections with 0.61). Predictions for unseen countries showed similar pattern: ROC AUC varied between 0.71 and 0.89, with the exception of Russia with ROC AUC 0.53. In comparison with previous attempts, our approach yielded higher overall performance and greater generalization over not modeled factors like country or category. These findings support the feasibility, efficiency, and predictive validity of sales predictions from large-scale sampling of viewers’ moment-to-moment responses to commercial media. © 2017 Elsevier B.V. All rights reserved. 1. Introduction Advertising is about influencing consumer preferences, apprai- sals, action tendencies, and purchases. Television and increasingly online video commercials are a key component. Over 80 billion dollars is spent annually on television commercials in the US alone [1]. For the companies that produce commercials and for their clients, there is great interest in evaluating the effectiveness of com- mercials they produce and distribute. One approach is to correlate television advertisements with product sales (online shopping in a short time window around the time of tv ad) [2]. This approach enables a gross estimate of direct influence of advertising on sales but is blind to consumer reactions to individual commercials. For that, it is necessary to assess consumer responses to specific commercials in relation to product sales. One solution is to ask viewers to report on their responses to com- mercials. Focus groups, personal interviews, random-digit phone This paper has been recommended for acceptance by Mohammad Soleymani. * Corresponding author. E-mail address: gabor.szirtes@realeyesit.com (G. Szirtes). surveys, and online surveys have been used for this purpose. While providing useful information, these methods have notable limita- tions. They pull for rational thinking rather than emotional responses that may be more predictive of purchase behavior; respondents must verbally represent what often are non-verbal, often unconscious cognitive-emotional reactions; and the dynamics of their responses may be compromised by recency effects. Demand characteristics and social desirability effects may bias reports as well. Focus groups, surveys, and related methods further assume that verbal reports are necessarily the best indices of purchasing influences. Evidence suggests otherwise. People’s preferences often are outside of their awareness and strongly influenced by emotion [3,4]. Emotions consist of multiple components that include subjective feelings, action tendencies and physiological arousal. All are prime candidates for influencing likelihood of purchase decisions. During emotion episodes, these components become correlated [5]. Automated facial expression analysis using web-cam video acqui- sition is a promising alternative. Using computer vision and machine learning, facial expressions of emotion to television advertisements can be measured on a moment-to-moment basis. This approach avoids the necessity for viewers to verbally report their experience, captures fine-grained information about the timing of behavior, and http://dx.doi.org/10.1016/j.imavis.2017.03.002 0262-8856/© 2017 Elsevier B.V. All rights reserved.