1 Evaluating TV Ad Campaigns Using Set-Top Box Data Sundar Dorai-Raj, Yannet Interian, and Dan Zigmond Google, Inc. Abstract Google has developed new metrics based on set-top box data for predicting the future audience retention of TV ads. This paper examines how to use these metrics to judge the effectiveness of TV ad campaigns. More specifically, we analyze how these metrics can inform future campaign targeting and placement goals. Introduction In recent years, there has been an explosion of interest in collecting and analyzing television set-top box (STB) data (also called “return path” data). As US television moves from analog to digital signals, digital set-top boxes are increasingly common in American homes. Where these STBs are attached to some sort of return path, this data can be aggregated and licensed to companies wishing to measure television viewership. For example, Google aggregates data, collected and anonymized by DISH Network L.L.C., describing the precise second-by-second tuning behavior from television set-top boxes in millions of US households. This data can be combined with detailed airing logs for thousands of daily TV ads to estimate second-by-second fluctuations in audience during TV commercials (Zigmond and Lanning, 2008). These data hold the promise of providing accurate measurement for much of the niche TV content that eludes current panel-based methods. But in addition to using these data for raw audience measurement, it is possible to make more qualitative judgments about the content – and specifically the advertising – on television. Google has developed a measure of audience retention based on STB data that can be used to predict future audience response for TV ads (Zigmond, 2009a and Zigmond et al, 2009b). This paper will look at how this new retention metric can be applied to measure the effectiveness of TV ad campaigns. Retention Scores Raw measures of audience tuning behavior during TV ads can be useful in evaluating TV ads. However, we have found that these metrics are highly influenced by extraneous factors such as the time-of-day, day-of-week, and the network on which the ads were aired. These are nuisance variables and make direct comparison of such measures very difficult. Rather than using these measures directly, we have developed a model for normalizing the scores relative to expected tuning behavior (Zigmond et al, 2009b). We do this by using a statistical model to estimate the “expected” tuning behavior during a given ad spot (based on known influencing factors like time-of-day, day-of-week, etc.), and subtract from this the observed tuning behavior during a specific ad airing. We then score ads or campaigns by looking at the percentage of airings in which this residual (ie, the expected minus the actual) exceeds the median. We call this quantity the “retention