JMLR: Workshop and Conference Proceedings vol (2010) 1–7 Workshop on Applications of Pattern Analysis Automating News Content Analysis: An Application to Gender Bias and Readability Omar Ali Omar.Ali@bristol.ac.uk Ilias Flaounas Ilias.Flaounas@bristol.ac.uk Tijl De Bie Tijl.DeBie@bristol.ac.uk Intelligent Systems Laboratory, University of Bristol, UK Nick Mosdell mosdelln1@cardiff.ac.uk Justin Lewis LewisJ2@cardiff.ac.uk Cardiff School of Journalism, Media and Cultural Studies, Cardiff University, UK Nello Cristianini Nello.Cristianini@bristol.ac.uk Intelligent Systems Laboratory, University of Bristol, UK Abstract In this article we present an application of text-analysis technologies to support social science research, in particular the analysis of patterns in news content. We describe a system that gathers and annotates large volumes of textual data in order to extract patterns and trends. We have examined 3.5 million news articles and show that their topic is related to the gender bias and readability of their content. This study is intended to illustrate how pattern analysis technology can be deployed to automate tasks commonly performed by humans in the social sciences, in order to enable large scale studies that would otherwise be impossible. 1. Introduction The analysis of news content is an important part of modern social sciences, and is often aimed at disclosing subtle biases in the way news is reported. The presence and portrayal of gender in the news media has a long history within media and cultural studies, often involving complex judgements of stereotypes and language as well as the relative incidence of male and female sources and actors (Carter et al., 1998). The majority of these investi- gations are conducted by hand, and involve selecting small samples of news coverage and collecting information relevant to the study in a process known as ‘coding’. This process requires a high level of attention to detail and must also be repeated independently in order to minimise human error and bias. In this paper we describe a system that incorporates text-analysis technologies for the automation of some of these tasks, enabling us to extract patterns from news media coverage on a very large scale. We have used automated techniques to gather over 3.5 million on-line news articles and have extracted information from them, such as gender references and ease of readability. The data we present here is illustrative of what can be achieved using automatic coding technology. Our findings suggest, first of all, a strong gender bias in the set of people covered in the news. We found that references to men outnumber references to women by three c 2010 , O. Ali, I. Flaounas, T. De Bie, N. Mosdell, J. Lewis & N. Cristianini.