‘‘Nice things get said’’: corpus evidence and the National Literacy Strategy Alison Sealey and Paul Thompson Abstract The article compares evidence from an electronic corpus of texts written for a child audience with specifications in the National Literacy Strategy. The concepts and terminology associated with corpus linguistics are introduced and explained, and the research study from which the findings derive is summarised. Results of the analysis are presented in sections on word frequencies, contractions, word forms and synonyms. The article concludes with an indication of the implications of the findings for education policy and classroom practice. Key words: National Literacy Strategy, vocabulary, corpus linguistics, primary, Key Stage 2. Introduction This article arises from an ESRC funded project 1 investigating the use of corpus-based approaches to learning about language in the primary school. We begin by explaining what we mean by a ‘corpus’, con- tinuing with a brief description of what the research involved, in the context of comparable research with different categories of learner. Next, we present some findings from our investigation into the linguistic characteristics of the writing that children are likely to read, comparing these with the prescriptions in England’s National Literacy Strategy (NLS). The article concludes with a consideration of the implications of our findings for policy, classroom practice and further research. What is a corpus? In the context of contemporary work in language description and analysis, a ‘corpus’ is understood to be an electronically stored databank of authentic langu- age. One definition is ‘‘a collection of pieces of langu- age, selected and ordered according to explicit criteria in order to be used as a sample of the language’’ (Sinclair, 1996, in Aston and Burnard, 1998, p. 4). The enterprise of assembling very large quantities of language data has grown rapidly in the past twenty years or so, and ‘‘[i]t is no exaggeration to say that corpora, and the study of corpora, have revolutionised the study of language, and of the applications of language, over the last few decades’’ (Hunston, 2002, p. 1). Inherent in this approach is a conception of language as a social practice, rather than as an abstract mental process. Instead of inventing sentences to exemplify grammatical rules, corpus linguists seek to identify the patterns that emerge when the sentences people have actually written (and the utterances they have actually spoken) are collected together and submitted to computer-assisted analysis. Intuitions about the way language works are replaced by empirical data, and from very large quantities of such data emerge patterns and tendencies that are often quite surprising. We give some examples of this below, but one aspect is well summarised by researchers on one of the largest English language corpora (currently over 500 million words), the Bank of English: ‘‘grammatical patterns and lexical items are co-selected, and . . . it is impossible to look at one independently of the other. Particular grammatical patterns tend to co-occur with particular lexical items, and – the other side of the coin – lexical items seem to occur in only a limited range of patterns. The interdependence of grammar and lexis is such that they are ultimately inseparable, working together in the making of meaning’’ (Clear et al. 1996, p. 311). Although linguistic analyses in themselves cannot determine what the implications for teaching may be, researchers such as ourselves are interested in the applied dimensions of corpus linguistics. To take just one example, when teachers ask pupils to suggest a noun or adjective to be fitted into an empty slot in a contrived sentence, would it help to know that certain nouns and certain adjectives are much more likely to be co-selected – in authentic discourse – than are others? We shall return to such questions later in the article. The research study Our study investigated two main areas. One area of investigation was how groups of 8–10-year-old chil- dren in two English primary schools responded to activities that were derived from a corpus and aimed at teaching them about the grammar and vocabulary of English. These activities included opportunities for the children to discover for themselves some patterns in 22 Corpus evidence and the NLS r UKLA 2006. Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.