LSA as a Measure of Coherence in Second Language Natural Discourse Scott A. Crossley (sc544@msstate.edu) Department of English, Mississippi State University Mississippi State University, MS 39762 Philip M. McCarthy (pmmccrth@memphis.edu) Department of English, University of Memphis Memphis. TN 38152 Thomas Salsbury (tsalsbury@wsu.edu) Department of Education, Washington State University Pullman, WA 99164 Danielle S. McNamara (d.mcnamara@mail.psyc.memphis.edu) Department of Psychology, University of Memphis Memphis. TN 38152 Abstract This study explores Latent Semantic Analysis as a model of coherence relations in spoken discourse. Two studies were conducted on language data from six adult learners of English observed longitudinally for one year. The studies investigated whether LSA values increase as a function of time spent learning English and if there were links between negotiations for meaning and LSA values. Results show that LSA values increase significantly and that negotiations for meaning decrease significantly over time. Negotiations for meaning are also negatively correlated to LSA values. These findings provide evidence of co-referential semantic development in L2 learners. The findings also demonstrate that a lack of semantic similarity in natural spoken discourse is sufficient to trigger negotiations for meaning. Keywords: Coherence; Cohesion; Corpus Linguistics; Second Language Processing; Latent Semantic Analysis Introduction Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the meaning of words using statistical computations applied to a large corpus of text (Landauer & Dumais, 1997). LSA has the ability to model human conceptual knowledge in word sorting evaluations (Landauer & Dumais, 1997), relatedness judgments (Landauer et al., 1998) word synonymy judgments, and vocabulary learning (Landauer & Dumais, 1997). LSA is also an important model of coherence relations in both text (Foltz, Kintsch, & Landauer, 1998) and in some categories of spoken discourse (Elvevag et al., 2006; Gorman et al., 2003). However, the potential for LSA to evaluate coherence patterns other than topic shifts in extended, natural spoken discourse has not been examined. Thus, in this analysis, we are interested in demonstrating how LSA functions as a model of semantic co-referential coherence in natural speech. To analyze coherence in natural speech, we use a longitudinal corpus of speech data taken from a group of English as a second language (L2) learners. The theoretical assumption is that as L2 learners progress in their language learning (i.e. interlanguage), their discourse will become more coherent and LSA can be used to analyze and monitor this development. We examine this hypothesis by first investigating language data from English L2 learners to establish whether LSA values increase as a function of time spent studying English, providing evidence that language becomes more semantically similar over time. Second, we hand code the language data of the L2 learners for coherence breaks. We then analyze the frequency of these coherence breaks to determine if the breaks decrease as a function of time; thus supporting the notion that L2 language data becomes more coherent. We also compare the frequency of these coherence breaks to the LSA values for the language data to examine the relationship between coherence breaks and LSA values. Such an analysis is relevant to cognitive science for two reasons. First, it examines whether LSA can be used to evaluate the coherence of natural spoken discourse based on semantic co-referentiality. Second, it could further our understanding of L2 acquisition by examining L2 semantic growth and provide an automated means to measure L2 semantic development. Coherence Coherence is generally associated with the interpretability of discourse (Anderson, 1995). Graesser, McNamara, and Louwerse (2003) further specify that coherence refers to the representational relationships of a text in the mind of a reader or listener, whereas cohesion refers to the cues in the text that help the reader to build a coherent representation (Foltz, 2007). There are many types of cohesive devices available to a proficient speaker. These include ellipsis, conjunction, anaphora (Halliday & Hasan’s, 1976), causal verbs and particles (McNamara, Cai, & Louwerse, 2007) and semantic similarity (Foltz, 2007). Cohesive devices are important in text for connecting ideas with topics (Graesser et al., 2004) and in speech for allowing indications of coherence in a message, and providing interlocutors with a means to interpret messages (Tanskanen, 2006). Cohesive devices also allow interlocutors to make links between pieces of discourse and transition information from one section of discourse to another. Gaps in cohesion force participants to either make inferences to complete the gaps (McNamara et al., 1996) or, if inferences are not possible, to negotiate meaning. Coherence Breaks In both first language (L1) studies and L2 studies, coherence breaks in natural speech often lead to negotiations for 1906