Measuring issue salience: Using supervised machine learning to generate data from free responses to the “most important problem” question Solomon Messing February 15, 2011 Abstract Future political communication research will depend on measuring issue-salience in massive data sets largely consisting of unstructured text. We examine a set of super- vised machine-learning approaches that can categorize short segments of unstructured text data based on human judgments of content, with an attractive cost structure. We achieve levels of accuracy that are generally acceptable in human-coder content analysis, and which has perfect replicability once human coders produce appropriate training data. We use this approach to classify free responses to the “most important problem” question from the 2000 NAES rolling cross section, and show that our results generally correspond to exogenous real-world events that are widely thought to have shifted the attention of the public. 1 Measuring salience Of critical importance to many questions in political communication is the ebb and ﬂow of public issue salience leading up to electoral contests. Perhaps the best way to gauge public opinion on current issues is a well-designed survey. However, the most widely used type of survey question—the structured, close-ended response—tends to inject a wide range of biases in measurement. Most notably, closed responses constrain the set of responses. However, the problem of constraining responses is not mitigated by merely adding the option to name another problem outside the set of possible choices: Schuman (2008, 32) split 349 participants into open and closed with an “other” response to the most important problem question, and found that 60% of respondents in the 1